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ETAPS Foreword 


Welcome to the 27th ETAPS! ETAPS 2024 took place in Luxembourg City, the 
beautiful capital of Luxembourg. 

ETAPS 2024 is the 27th instance of the European Joint Conferences on Theory and 
Practice of Software. ETAPS is an annual federated conference established in 1998, 
and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each con- 
ference has its own Program Committee (PC) and its own Steering Committee (SC). 
The conferences cover various aspects of software systems, ranging from theoretical 
computer science to foundations of programming languages, analysis tools, and formal 
approaches to software engineering. Organising these conferences in a coherent, highly 
synchronized conference programme enables researchers to participate in an exciting 
event, having the possibility to meet many colleagues working in different directions in 
the field, and to easily attend talks of different conferences. On the weekend before the 
main conference, numerous satellite workshops took place that attracted many 
researchers from all over the globe. 

ETAPS 2024 received 352 submissions in total, 117 of which were accepted, 
yielding an overall acceptance rate of 33%. I thank all the authors for their interest in 
ETAPS, all the reviewers for their reviewing efforts, the PC members for their con- 
tributions, and in particular the PC (co-)chairs for their hard work in running this entire 
intensive process. Last but not least, my congratulations to all authors of the accepted 
papers! 

ETAPS 2024 featured the unifying invited speakers Sandrine Blazy (University of 
Rennes, France) and Lars Birkedal (Aarhus University, Denmark), and the invited 
speakers Ruzica Piskac (Yale University, USA) for TACAS and Jérôme Leroux 
(Laboratoire Bordelais de Recherche en Informatique, France) for FoSSaCS. Invited 
tutorials were provided by Tamar Sharon (Radboud University, the Netherlands) on 
computer ethics and David Monniaux (Verimag, France) on abstract interpretation. 

As part of the programme we had the first ETAPS industry day. The goal of this day 
was to bring industrial practitioners into the heart of the research community and to 
catalyze the interaction between industry and academia. The day was organized by 
Nikolai Kosmatov (Thales Research and Technology, France) and Andrzej Wasowski 
(IT University of Copenhagen, Denmark). 

ETAPS 2024 was organized by the SnT - Interdisciplinary Centre for Security, 
Reliability and Trust, University of Luxembourg. The University of Luxembourg was 
founded in 2003. The university is one of the best and most international young 
universities with 6,000 students from 130 countries and 1,500 academics from all over 
the globe. The local organisation team consisted of Peter Y.A. Ryan (general chair), 
Peter B. Roenne (organisation chair), Maxime Cordy and Renzo Gaston Degiovanni 
(workshop chairs), Magali Martin and Isana Nascimento (event manager), Marjan 
Skrobot (publicity chair), and Afonso Arriaga (local proceedings chair). This team also 
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organised the online edition of ETAPS 2021, and now we are happy that they agreed to 
also organise a physical edition of ETAPS. 

ETAPS 2024 is further supported by the following associations and societies: 
ETAPS e.V., EATCS (European Association for Theoretical Computer Science), 
EAPLS (European Association for Programming Languages and Systems), and EASST 
(European Association of Software Science and Technology). 

The ETAPS Steering Committee consists of an Executive Board, and representa- 
tives of the individual ETAPS conferences, as well as representatives of EATCS, 
EAPLS, and EASST. The Executive Board consists of Marieke Huisman (Twente, 
chair), Andrzej Wasowski (Copenhagen), Thomas Noll (Aachen), Jan Kofron (Prague), 
Barbara König (Duisburg), Arnd Hartmanns (Twente), Caterina Urban (Inria), Jan 
Křetínský (Munich), Elizabeth Polgreen (Edinburgh), and Lenore Zuck (Chicago). 

Other members of the steering committee are: Maurice ter Beek (Pisa), Dirk Beyer 
(Munich), Artur Boronat (Leicester), Luis Caires (Lisboa), Ana Cavalcanti (York), 
Ferruccio Damiani (Torino), Bernd Finkbeiner (Saarland), Gordon Fraser (Passau), 
Arie Gurfinkel (Waterloo), Reiner Hahnle (Darmstadt), Reiko Heckel (Leicester), 
Marijn Heule (Pittsburgh), Joost-Pieter Katoen (Aachen and Twente), Delia Kesner 
(Paris), Naoki Kobayashi (Tokyo), Fabrice Kordon (Paris), Laura Kovacs (Vienna), 
Mark Lawford (Hamilton), Tiziana Margaria (Limerick), Claudio Menghi (Hamilton 
and Bergamo), Andrzej Murawski (Oxford), Laure Petrucci (Paris), Peter Y.A. Ryan 
(Luxembourg), Don Sannella (Edinburgh), Viktor Vafeiadis (Kaiserslautern), Stepha- 
nie Weirich (Pennsylvania), Anton Wijs (Eindhoven), and James Worrell (Oxford). 

I would like to take this opportunity to thank all authors, keynote speakers, atten- 
dees, organizers of the satellite workshops, and Springer Nature for their support. 
ETAPS 2024 was also generously supported by a RESCOM grant from the Luxem- 
bourg National Research Foundation (project 18015543). I hope you all enjoyed 
ETAPS 2024. 

Finally, a big thanks to both Peters, Magali and Isana and their local organization 
team for all their enormous efforts to make ETAPS a fantastic event. 


April 2024 Marieke Huisman 
ETAPS SC Chair 
ETAPS e.V. President 


Preface 


These proceedings volumes contain papers that were presented at the 33rd European 
Symposium on Programming (ESOP 2024), held during April 6-11 in Luxembourg 
City, Luxembourg, along with associated artifact reports. ESOP is part of the European 
Joint Conferences on Theory and Practice of Software (ETAPS) and promotes the 
specification, design, analysis and implementation of programming languages and 
systems. 

In total, these two volumes include 25 research papers, one “fresh perspective” and 
four “artifact reports”. The latter two paper categories are new to ESOP. In addition to 
standard research papers, the ESOP 2024 call-for-papers included the new submission 
categories: “fresh perspectives” that provide new insights in a particularly elegant way 
and “experience reports” that describe tools and systems used in practice. Furthermore, 
authors of accepted papers were allowed to submit short “artifact reports”, to appear 
together with their research papers, that describe associated software, tools, data sets, or 
machine checked proofs to substantiate the claims made in their papers. 

The papers in this volume were selected from 66 papers submitted in the research 
paper category and 6 papers submitted in the “fresh perspectives” category. There were 
no submissions for “experience reports”. While papers in these new categories had 
strict formatting requirements, ESOP 2024 allowed research papers to be submitted in 
any format, of any length, under the advisement that the final paper should be formatted 
to fit this volume. Fourteen submissions took advantage of this flexibility. 

Each submitted paper received at least three reviews by the members of the ESOP 
program committee. The median PC member was assigned eight papers to review over 
the seven week review period. In some cases, PC members solicited additional reviews 
to aid in the decision making process. In total, 39 external reviewers added their insight 
to the paper selection process. ESOP employed full double-blind review and author 
identities were only revealed to reviewers on paper acceptance. Authors were also 
given a chance to respond to their reviews, before the program was selected through a 
two week online, asynchronous PC meeting, facilitated by the EasyChair system. The 
program chair had no conflicts with any submitted paper. 

ESOP 2024 also employed an artifact evaluation process. Nineteen of the 26 
accepted papers elected to make their artifacts available on the archive sites Zenodo and 
figshare. The committee awarded the badge “Functional” to five of these and the 
badges “Functional and reusable” to the remaining fourteen. Four accepted papers in 
this volume are accompanied by artifact reports. These reports were all accepted fol- 
lowing a light review by both the program committee and the ESOP/FASE/FoSSaCS 
joint artifact evaluation committee. 

Indeed, my sincere thanks go to all who worked together to produce this event and 
its proceedings. Foremost, to the authors, who provided the technical content of the 
meeting. Also to the program committee, artifact evaluation committee, and external 
reviewers, who provided their well-reasoned and detailed judgments, sometimes on 
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short notice. Tobias Kappé as the representative for ESOP among the artifact evalu- 
ation committee co-chairs, deserves particular thanks. I also would like to thank the 
ETAPS steering committee and its chair Marieke Huisman, the Proceedings coordi- 
nator Barbara König and the local proceedings chair Afonso Delerue Arriaga, and 
webmaster Jan Kofroň for their assistance in fitting ESOP together with the entire 
ETAPS meeting. Finally, thanks are due to the members of the ESOP steering com- 
mittee. In particular, Luis Caires, as chair of the SC, was a constant source of support, 
encouragement, information and guidance. 


April 2024 Stephanie Weirich 
ESOP PC Chair 
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Circuit Width Estimation via Effect Typing 
and Linear Dependency 


Andrea Colledan!:2)®| and Ugo Dal Lago! 20) 


1 University of Bologna, Bologna, Italy 
2 INRIA Sophia Antipolis, Valbonne, France 
{andrea.colledan,ugo.dallago}@unibo.it 


Abstract. Circuit description languages are a class of quantum pro- 
gramming languages in which programs are classical and produce a de- 
scription of a quantum computation, in the form of a quantum circuit. 
Since these programs can leverage all the expressive power of high-level 
classical languages, circuit description languages have been successfully 
used to describe complex and practical quantum algorithms, whose cir- 
cuits, however, may involve many more qubits and gate applications 
than current quantum architectures can actually muster. In this paper, 
we present Proto-Quipper-R, a circuit description language endowed with 
a linear dependent type-and-effect system capable of deriving paramet- 
ric upper bounds on the width of the circuits produced by a program. 
We prove both the standard type safety results and that the resulting 
resource analysis is correct with respect to a big-step operational se- 
mantics. We also show that our approach is expressive enough to verify 
realistic quantum algorithms. 


Keywords: Effect Typing - Lambda Calculus - Quantum Computing - 
Quipper 


1 Introduction 


With the promise of providing efficient algorithmic solutions to many prob- 
lems [11)27[31], some of which are traditionally believed to be intractable [54], 
quantum computing is the subject of intense investigation by various research 
communities within computer science, not least that of programming language 
theory [24/43[51]. Various proposals for idioms capable of tapping into this new 
computing paradigm have appeared in the literature since the late 1990s. Some 
of these approaches turn out to be fundamentally new M495A, while many 
others are strongly inspired by classical languages and traditional programming 
paradigms 4148.33.63]. 

One of the major obstacles to the practical adoption of quantum algorithmic 
solutions is the fact that despite huge efforts by scientists and engineers alike, it 
seems that reliable quantum hardware, contrary to classical one, does not scale 
too easily: although quantum architectures with up to a couple hundred qubits 
have recently seen the light [O38], it is not yet clear whether the so-called 
quantum advantage [45] is a concrete possibility, given the tremendous challenges 
posed by the quantum decoherence problem [50]. 
© The Author(s) 2024 
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This entails that software which makes use of quantum hardware must be 
designed with great care: whenever part of a computation has to be run on quan- 
tum hardware, the amount of resources it needs, and in particular the amount 
of qubits it uses, should be kept to a minimum. More generally, a fine control 
over the low-level aspects of the computation, something that we willingly ab- 
stract from when dealing with most forms of classical computation, should be 
exposed to the programmer in the quantum case. This, in turn, has led to the 
development and adoption of many domain-specific programming languages and 
libraries in which the programmer explicitly manipulates qubits and quantum 
circuits, while still making use of all the features of a high-level classical pro- 
gramming language. This is the case of the Qiskit and Cirg libraries [I7], but 
also of the Quipper language [25/26]. 


At the fundamental level, Quipper is a circuit description language embedded 
in Haskell. Because of this, Quipper inherits all the expressiveness of the high 
level, higher-order functional programming language that is its host, but for the 
same reason it also lacks a formal semantics. Nonetheless, over the past few 
years, a number of calculi, collectively known as the Proto-Quipper language 
family, have been developed to formalize interesting fragments and extensions 
of Quipper in a type-safe manner [4648]. Extensions include, among others, 
dynamic lifting [8/2135] and dependent types [20)22], but resource analysis is 
still a rather unexplored research direction in the Proto-Quipper community [56]. 


The goal of this work is to show that type systems indeed enable the possibil- 
ity of reasoning about the size of the circuits produced by a Proto-Quipper pro- 
gram. Specifically, we show how linear dependent types in the style of Gaboardi 
and Dal Lago [12/14[15/23] can be adapted to Proto-Quipper, allowing to derive 
upper bounds on circuit widths that are parametric on the number of input 
wires to the circuit, be they classical or quantum. This enables a form of static 
analysis of the resource consumption of circuit families and, consequently, of the 
quantum algorithms described in the language. Technically, a key ingredient of 
this analysis, besides linear dependency, is a novel form of effect typing in which 
the quantitative information coming from linear dependency informs the effect 
system and allows it to keep circuit widths under control. 


The rest of the paper is organized as follows. Section P] informally explores 
the problem of estimating the width of circuits produced by Quipper, while 
also introducing the language. Section |3| provides a more formal definition of 
the Proto-Quipper language. In particular, it gives an overview of the system of 
simple types due to Rios and Selinger [46], which however is not meant to reason 
about the size of circuits. We then move on to the most important technical 
contribution of this work, namely the linear dependent and effectful type system, 
which is introduced in Section Bland proven to guarantee both type safety and a 
form of total correctness in Section |5| Section Blis dedicated to an example of a 
practical application of our type and effect system, that is, a program that builds 
the Quantum Fourier Transform (QFT) circuit and which is verified to 
do so without any ancillary qubits. 
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To conclude this introduction, we wish to emphasize that while it is true 
that quantum computing can be a difficult and intimidating subject, the class 
of languages analyzed in this work focuses on circuit construction, which is an 
entirely classical process, paying little to no concern to the actual quantum 
semantics of circuit execution. Because of this, and due to space constraints, 
we refrain from providing a general introduction to quantum computing in this 
paper. Instead, we refer the interested reader to the excellent works of Nielsen 
and Chuang [39], Yanofsky and Mannucci [60], and Mingsheng [61]. 


2 An Overview on Circuit Width Estimation 


Quipper allows programmers to describe quantum circuits in a high-level and 
elegant way, using both gate-by-gate and circuit transformation approaches. 
Quipper also supports hierarchical and parametric circuits, thus promoting a 
view in which circuits become first-class citizens. Quipper has been shown to 
be scalable, in the sense that it has been effectively used to describe complex 
quantum algorithms that easily translate to circuits involving trillions of gates 
applied to millions of qubits. The language allows the programmer to optimize 
the circuit, e.g. by using ancilla qubits for the sake of reducing the circuit depth, 
or recycling qubits that are no longer needed. 

One feature that Quipper lacks is a methodology for statically proving that 
important parameters — such as the the width — of the underlying circuit 
are below certain limits, which of course would need to be parametric on the 
input size of the circuit. If this kind of analysis were available, then it would be 
possible to derive bounds on the number of qubits needed to solve any instance 
of a problem, and ultimately to know in advance how big of an instance can be 
possibly solved given a fixed amount of qubits. 

In order to illustrate the kind of scenario we are reasoning about, this section 
offers some simple examples of Quipper programs, showing in what sense we can 
think of capturing the quantitative information that we are interested in through 
types and effect systems and linear dependency. We proceed at a very high level 
for now, without any ambition of formality. 

Let us start with the example of Figure |1| The Quipper function on the 
left builds the structure on the right, which we call a quantum circuit. For the 
purposes of this work, it suffices to say that horizontal lines represent qubits, 
while other symbols represent elementary operations applied to them, e.g. ini- 
tializations, gate applications, and so on. Time flows from left to right. The 
specific circuit in Figure [1] consists in an (admittedly contrived) implementation 
of the quantum not operation. The dumbNot function implements negation using 
a controlled not gate and an ancillary qubit a, which is initialized and discarded 
within the body of the function. This qubit does not appear in the interface of 
the circuit, but it clearly adds to its overall width, which is 2. 

Consider now the higher-order function in Figure |2| This function takes as 
input a circuit building function f, an integer n and describes the circuit obtained 
by applying f’s circuit n times to the input qubit q. It is easy to see that the 
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dumbNot :: Qubit -> Circ Qubit 
dumbNot q = do 


a <- qinit True [1) j 
(q,a) <- controlled_not q a 

i q q 
qdiscard a 


return q 


Fig. 1. A contrived implementation of the quantum not operation using an ancilla 


width of the circuit produced in output by iter dumbNot n is equal to 2, even 
though, overall, the number of qubits initialized during the computation is equal 
to n. The point is that each ancilla is created only after the previous one has 
been discarded, thus enabling a form of qubit recycling. 


iter :: (Qubit -> Circ Qubit) 

-> Int -> Qubit -> Circ Qubit |1) q T ++ |1) T 
iter f 0 q = return q 

iter f n q = do q me q 
q i a n times 

iter f (n-1) q 


Fig. 2. A higher-order function which iterates a circuit-building function f on an input 
qubit q and the result of its application to the dumbNot function from Figure [1] 


Is it possible to statically analyze the width of the circuit produced in output 
by iter dumbNot n so as to conclude that it is constant and equal to 2? What 
techniques can we use? Certainly, the presence of higher order types complicates 
an already non-trivial problem. The approach we propose in this paper is based 
on two ingredients. The first is the so-called effect typing [40]. In this context the 
effect produced by the program is nothing more than the circuit and therefore 
it is natural to think of an effect system in which the width of such circuit, and 
only that, is exposed. Therefore, the arrow type A —> B should be decorated 
with an expression indicating the width of the circuit produced by the corre- 
sponding function when applied to an argument of type A. Of course, the width 
of an individual circuit is a natural number, so it would make sense to annotate 
the arrow with such a number. For technical reasons, however, it will also be 
necessary to keep track of another natural number, corresponding to the number 
of wire resources that the function captures from the surrounding environment. 
This necessity stems from a need to keep track of wires even in the presence of 
data hiding, and will be explained in further detail in Section 

Under these premises, the dumbNot function would receive type Qubit 2,9 
Qubit, meaning that it takes as input a qubit and produces a circuit of width 2 
which outputs a qubit. Note that the second annotation is 0, since we do not cap- 
ture anything from the function’s environment, let alone a wire. Consequently, 
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because iter iterates in sequence and because the ancillary qubit in dumbNot 
can be reused, the type of iter dumbNot n would also be Qubit +29 Qubit. 


hadamardN :: [Qubit] -> Circ [Qubit] q q 


hadamardN [] = return [] 


hadamardN (q:qs) = do qsı qsı 


q <- hadamard q 
qs <- hadamardN qs 


ae. 
kaaa a 
return (q:qs) Wn Sy 


Fig. 3. The hadamardN function implements a circuit family where circuits have width 
linear in their input size. 


Let us now consider a slightly different situation, in which the width of the 
produced circuit is not constant, but rather increases proportionally to the cir- 
cuit’s input size. Figure [3] shows a Quipper function that returns a circuit on n 
qubits in which the Hadamard gate is applied to each qubit, a common prepro- 
cessing step in many quantum algorithms. It is obvious that this function works 
on inputs of arbitrary size, and therefore we can interpret it as a circuit family, 
parametrized on the length of the input list of qubits. This quantity, although 
certainly a natural number, is unknown statically and corresponds precisely to 
the width of the produced circuit. It is thus natural to wonder whether the kind of 
effect typing we briefly hinted at in the previous paragraph is capable of dealing 
with such a function. Certainly, the expressions used to annotate arrows cannot 
be, like in the previous case, mere constants, as they clearly depend on the size 
of the input list. Is there a way to reflect this dependency in types? Certainly, 
one could go towards a fully-fledged notion of dependent types, like the ones pro- 
posed in [22], but a simpler approach, in the style of Dal Lago and Gaboardi’s 
linear dependent types [12[14]15]23] turns out to be enough for this purpose. This 
is precisely the route that we follow in this paper. In this approach, terms can 
indeed appear in types, but that is only true for a very restricted class of terms, 
disjoint from the ordinary ones, called index terms. As an example, the type of 
the function hadamardN above could become List Qubit > i,0 List? Qubit, where 
i is an index variable. The meaning of the type would thus be that hadamardN 
takes as input any list of qubits of length i and produces a circuit of width at 
most i which outputs ¿į qubits. Indices are better explained in Section |4| but 
in general we can say that they consist of arithmetical expressions over natu- 
ral numbers and index variables, and can thus express non-trivial dependencies 
between input sizes and corresponding circuit widths. 
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3 The Proto-Quipper Language 


This section aims at introducing the Proto-Quipper family of calculi to the non- 
specialist, without any form of resource analysis. At its core, Proto-Quipper is a 
linear lambda calculus with bespoke constructs to build and manipulate circuits. 
Circuits are built as a side-effect of a computation, behind the scenes, but they 
can also appear and be manipulated as data in the language. 


Types TYPE A,B:zu=1|w|!A|A@®B|A—B| List A | Circ(T,U) 
Parameter types PTYPE P,R:=1|!A|P@R| List P | Circ(T,U) 
Bundle types BTYPE T,U:=1|w|T@U 


Fig. 4. The Proto-Quipper types 


The types of Proto-Quipper are given in Figure |4| Speaking at a high level, 
we can say that Proto-Quipper employs a linear-nonlinear typing discipline. In 
particular, w € {Bit, Qubit} is a wire type and is linear, while — is the linear 
arrow constructor. A subset of types, called parameter types, represent the values 
of the language that are not linear and that can therefore be copied. Any term of 
type A can be lifted into a duplicable parameter of type ! A if its type derivation 
does not require the use of linear resources. 


Terms TERM M,N ::= VW | let (2, y) = V in M | force V | boxr V 
| apply(V, W) | return V | let x = M in N 
Values VAL V,W:=x*|a|€|Av,4.M | lift M | (£C, k) | (V, W) 


_ [nil | cons V W | fold V W 
Wire bundles BVAL  £,k::= x | £| (£, k) 


Fig. 5. The Proto-Quipper syntax 


The syntax of Proto-Quipper is given in Figure |5| At a very high level, we 
are dealing with an effectful lambda calculus with bespoke constructs for ma- 
nipulating circuits. A return expression turns a value into a trivial computation, 
while a let expression is used to sequence computations. Note that let is asso- 
ciative and that return acts as its identity. Now, let us informally dissect the 
domain-specific aspects of this language, starting with the language of values. 
The constructs of greatest interest are labels and bored circuits. A label £ rep- 
resents a reference to a free wire of the underlying circuit being built and is 
attributed a wire type w € {Bit, Qubit}. Due to the no-cloning property of quan- 
tum states [39], labels have to be treated linearly. Arbitrary structures of labels 
form a subset of values which we call wire bundles and which are given bundle 
types. On the other hand, a boxed circuit (4,C, k) represents a circuit object C as 
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a datum within the language, together with its input and output interfaces £ and 
k. Such a value is given parameter type Circ(T,U), where bundle types T and 
U are the input and output types of the circuit, respectively. Boxed circuits can 
be copied, manipulated by primitive functions and, more importantly, applied 
to the underlying circuit. This last operation, which lies at the core of Proto- 
Quipper’s circuit-building capabilities, is possible thanks to the apply operator. 
This operator takes as first argument a boxed circuit (¢,C,k) and appends C to 
the underlying circuit D. How does apply know where exactly in D to apply C? 
Thanks to a second argument: a bundle of wires ¢ coming from the free output 
wires of D, which identify the exact location where C is supposed to be attached. 


The language is expected to be endowed with constant boxed circuits corre- 
sponding to fundamental gates and operations (e.g. Hadamard, CNOT, initial- 
ization, etc.), but the programmer can also introduce their own boxed circuits via 
the box operator. Intuitively, box takes as input a circuit-building function and 
executes it in a sandboxed environment, on dummy arguments, in a way that 
leaves the underlying circuit unchanged. Said function produces a standalone 
circuit C, which is then returned by the box operator as a boxed circuit (€,C, k). 

Figure [6] shows the Proto-Quipper term corresponding to the Quipper pro- 
gram in Figure [1] as an example of the use of the language. Note that let (x, y) = 
M in N is syntactic sugar for let z = M in let (x,y) = z in N. The dumbNot 
function is given type Qubit —o Qubit and builds the circuit shown in Figure 
when applied to an argument. 


dumbNot £ dAqaubit. let a = apply(INIT1, *) in 
let (q, a) = apply(CNOT, (q, a)) in 
let _ = apply(DISCARD, a) in 


return q 


Fig. 6. An example Proto-Quipper program. INIT,;, CNOT and DISCARD are primitive 
boxed circuits implementing the corresponding elementary operations. 


On the classical side of things, it is worth mentioning that Proto-Quipper as 
presented in this section does not support general recursion. A limited form of 
recursion on lists is instead provided via a primitive fold constructor, which takes 
as argument a (copiable) step function of type !((B@A) — B), an initial value of 
type B, and constructs a function of type List A — B. Although this workaround 
is not enough to recover the full power of general recursion, it appears that it 
is enough to describe many quantum algorithms. Figure [7] shows an example of 
the use of fold to reverse a list. Note that (x,y) a@p-M is syntactic sugar for 
AzA@p.let (x,y) = z in M. 
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rev £ fold lift(A(revList, q) List Qubit@Qubit-return (cons q revList)) nil 


Fig. 7. An example of the use of fold: the function that reverses a list 


To conclude this section, we just remark how all of the Quipper programs 
shown in Section 22] can be encoded in Proto-Quipper. However, Proto-Quipper’s 
system of simple types in unable to tell us anything about the resource consump- 
tion of these programs. Of course, one could run hadamardN on a concrete input 
and examine the size of the circuit produced at run-time, but this amounts to 
testing, not verifying the program, and lacks the qualities of staticity and para- 
metricity that we seek. 


4 Incepting Linear Dependency and Effect Typing 


We are now ready to expand on the informal definition of the Proto-Quipper 
language given in Section [8| to reach a formal definition of Proto-Quipper-R: a 
linearly and dependently typed language whose type system supports the deriva- 
tion of upper bounds on the width of the circuits produced by programs. 


4.1 Types and Syntax of Proto-Quipper-R 


Types TYPE A,B:=1|w|!A|A@B|A-—y,,; B| List’ A | Circ’(T,U) 
Param. types PTYPE P,R:=1|!A|P@R| List’ P| Circ’ (T, U) 
Bundle types BTYPE T,U:=1|w|T@U | List’ T 


Terms TERM M,N :=V W | let (x,y) =V in M | force V | boxr V 
| apply(V, W) | return V | let x = M in N 
Values VAL V,W ::= * | x | £| Aza.M | lift M | (2,C,k) | (V, W) 


__ [nil | cons V W | fold; V W_ 
Wire bundles BVAL L, k ::= * | £ | (£, k) | nil | cons £ k 
Indices INDEX I,J 2:=i|n|I+J|I-J|Ix J|max(J,J) | maxicr J 


Fig. 8. Proto-Quipper-R syntax and types 


The types and syntax of Proto-Quipper-R are given in Figure B] As we men- 
tioned, one of the key ingredients of our type system are the index terms with 
which we annotate standard Proto-Quipper types. These indices provide quanti- 
tative information about the elements of the resulting types, in a manner remi- 
niscent of refinement types [[8J47]. In our case, we are primarily concerned with 
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circuit width, which means that the natural starting point of our extension of 
Proto-Quipper is precisely the circuit type: Circ! (T,U) has elements the boxed 
circuits of input type T, output type U, and width bounded by I. Term I is 
precisely what we call an index, that is, an arithmetical expression denoting a 
natural number. Looking at the grammar for indices, their interpretation is fairly 
straightforward, with a few notes: n is a natural number, 7 is an index variable, 
I — J denotes natural subtraction, such that J — J = 0 whenever I < J, and 
lastly max;<; J is the maximum for i going from 0 (included) to I (excluded) of 
J, where i can occur in J. Note that J = 0 implies max;<; J = 0. While the index 
in a circuit type denotes an upper bound, the index in a type of the form List’ A 
denotes the exact length of the lists of that type. While this refinement might 
seem quite restrictive in a generic scenario, it allows us to include lists of labels 
among wire bundles, something that was not possible with simple lists. This is 
due to the fact that sized lists are effectively isomorphic to finite tensors, and 
therefore a sized list of labels represent a wire bundle of known size, whereas the 
same is not true for a simple list of labels. Lastly, as we anticipated in Section [2] 
an arrow type A —;,; B is annotated with two indices: J is an upper bound to 
the width of the circuit built by the function once it is applied to an argument of 
type A, while J describes the exact number of wires captured in the function’s 
closure. The utility of this last annotation will be clearer in Section 

The languages for terms and values are almost the same as in Proto-Quipper, 
with the minor difference that the fold operator now binds the index variable 
name 7 within its first argument. This variable appears locally in the type of the 
step function, in such a way as to allow each iteration of the fold to contribute 
to the overall circuit width in a different way. 


4.2 A Formal Language for Circuits 


The type system of Proto-Quipper-R is designed to allow for reasoning about 
the width of circuits. Therefore, before we formally introduce the type system 
in Section [4.3] we ought to introduce circuits themselves in a formal way. So far, 
we have only spoken of circuits at a very high and intuitive level, and we have 
represented them only graphically. Looking at the circuits in Section |2| what do 
they have in common? At the fundamental level, they are made up of elementary 
operations applied to specific wires. Of course, the order of these operations 
matters, as does the order of wires that they are applied to. In the existing 
literature on Proto-Quipper, circuits are usually interpreted as morphisms in a 
symmetric monoidal category [46], but this approach makes it particularly hard 
to reason about their intensional properties, such as width. For this reason, we 
opt for a concrete model of wires and circuits, rather than an abstract one. 

Luckily, we already have a datatype modeling ordered structures of wires, 
that is, the wire bundles that we introduced in the previous sections. We use 
them as the basis upon which we build circuits. 

That being said, Figure [9] introduces the Circuit Representation Language 
(CRL) which we use as the target for circuit building in Proto-Quipper-R. Wire 
bundles are exactly as in Figure |8| and represent arbitrary structures of wires, 
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Wire bundles BVAL  £,k::= | £| (£, k) | nil | cons £ k 
Bundle types BTYPE T,U :=1|w|T@U | List’ T 


Circuits CIRC C,D ::= idą |C; g(ġ > k 


Fig. 9. CRL syntax and types 


while circuits themselves are defined very simply as sequences of elementary 
operations applied to said structures. We call Q a label context and define it as 
a mapping from label names to wire types. We use label contexts as a means 
to keep track of the set of labels available in a computation, alongside their 
respective types. Circuit idg represents the identity circuit taking as input the 
labels in Q and returning them unchanged, while C;g(@) — k represents the 
application of the elementary operation g to the wires identified by Z among 
the outputs of C. Operation g outputs the wire bundle k, whose labels become 
part of the outputs of the overall circuit. Note that an “elementary operation” 
is usually the application of a gate, but it could also be a measurement, or the 
initialization or discarding of a wire. Although semantically very different, from 
the perspective of circuit building these operations are just elementary building 
blocks in the construction of a more complex structure, and it makes no sense 
to distinguish between them syntactically. Circuits are amenable to a form of 
concatenation. We write the concatenation of C and D as C :: D and define it in 
the natural way, that is, as C followed by all the operations occurring in D. 


Circuit Typing Naturally, not all circuits built from the CRL syntax make 
sense. For example id¢g.qupit); H(k) > k and idve-qubity;3 CNOT ((6,£)) —> (k, t) 
are both syntactically correct, but the first applies a gate to a non-existing wire, 
while the second violates the no-cloning theorem by duplicating £. To rule out 
such ill-formed circuits, we employ a rudimentary type system for circuits which 
allows us to derive judgments of the form C : Q —> L, which informally read 
“circuit C is well-typed with input label context Q and output label context L”. 

The typing rules for CRL are given in Figure We call Qty l: T a wire 
judgment, and we use it to give a structured type to an otherwise unordered 
label context, by means of a wire bundle. Most rules are straightforward, except 
those for lists, which rely on a judgment of the form F I = J. This is to be 
intended as a semantic judgment asserting that J and J are closed and equal 
when interpreted as natural numbers. Within the rule, this reflects the idea 
that there are many ways to syntactically represent the length of a list. For 
example, nil can be given type List T, but also List'~* T or List?” T. This kind 
of flexibility might seem unwarranted for such a simple language, but it is useful 
to effectively interface CRL and the more complex Proto-Quipper-R. Speaking of 
the actual circuit judgments, the seq rule tells us that the the application of an 
elementary operation g is well typed whenever g only acts on labels occurring 
in the outputs of C (those in @, that is in H), produces in output labels that 
do not clash with the remaining outputs of C (since L, K denotes the disjoint 
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ini = lab ———_____— nil ae at 
Oh. *. 1 0: wFy l: w Ow nil : List? T 


Qitwl:T Qetwk:U 
Q1, Q2 Fw (ek): TOU 


Qitwl:T Qotwk:Lit?T EI=J+1 


cons — 
Q1, Q2 kw cons £ k : List’ T 
C:Q>L,H Htyl:T Ktyk:U gE€g%(T,U) 
d — seq = = 
ido:Q >Q Cig —>k:Q>L,K 


Fig. 10. The CRL type system 


union of the two label contexts) and is of the right type. This last requirement is 
expressed as g € Y(T,U), where 9 (T, U) is the subset of elementary operations 
that can be applied to an input of type T to obtain an output of type U. For 
example, the Hadamard gate, which acts on a single qubit, is in Y(Qubit, Qubit). 


Circuit Width Among the many properties of circuits, we are interested in 
width, so we conclude this section by giving a formal status to this quantity. 


Definition 1 (Circuit Width). We define the width of a CRL circuit C, writ- 
ten width(C), as follows 


width(idg) = |Q], (1) 
width(C; 9(£) > k) = width(C) + max(0,new(g) — discarded(C)), (2) 


where |Q| is the number of labels in Q, new(g) represents the net number of 
new wires initialized by g, and discarded(C) is the number of wires that have 
been effectively discarded by the end of C, obtained as the difference between 
C’s width and the number of its outputs. Note that one expects new(g) to be 
equal to the difference between the number of labels in k and those in 4. The 
overarching idea behind this definition is that whenever we require new wires 
in our computation, we first try to reuse as many previously discarded wires as 
possible. As long as we can do this (new(g) < discarded(C)), the initializations 
do not add to the total width of the circuit. Otherwise (new(g) > discarded(C)) 
we must actually create new wires, increasing the overall width of the circuit. 

Now that we have a formal definition of circuit types and width, we can 
state a fundamental property of the concatenation of well-typed circuits, which 
is illustrated in Figure[l1|and proven in Theorem[I] We use this result pervasively 
in proving the correctness of Proto-Quipper-R in section [5] 


Theorem 1 (CRL). Given C : Q > L,H and D : H > K such that the labels 
shared by C and D are all and only those in H, we have 


1.C:D: Q> L,K, 
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2. width(C :: D) < max(width(C), width(D) + | L|). 


Proof. By induction of the derivation of D : H > K. 


|H| |K| 
pA m K 


Fig. 11. The kind of scenario described by Theorem [I] 


4.3 Typing Programs 


Going back to Proto-Quipper-R, we have already seen how the standard Proto- 
Quipper types are refined with quantitative information. However, decorating 
types is not enough for the purposes of width estimation. Recall that, in general, 
a Proto-Quipper program produces a circuit as a side effect of its evaluation. If 
we want to reason about the width of said circuit, it is not enough to rely on 
a regular linear type system, although dependent. Rather, we have to introduce 
the second ingredient of our analysis and turn to a type-and-effect system MOJ, 
revolving around a type judgment of the form 


O;r;QF. M: A;I, (3) 


which intuitively reads “for all natural values of the index variables in O, under 
typing context I and label context Q, term M has type A and produces a circuit 
of width at most I”. Therefore, O is a collection of index variables which are 
universally quantified in the rest of the judgment, while I’ is a typing context for 
parameter and linear variables alike. When a typing context contains exclusively 
parameter variables, we write it as ®. In this judgment, J plays the role of an 
effect annotation, describing a relevant aspect of the side effect produced by the 
evaluation of M (i.e. the width of the produced circuit). The attentive reader 
might wonder why this annotation consists only of one index, whereas when we 
discussed arrow types in previous sections we needed two. The reason is that the 
second index, which we use to keep track of the number of wires captured by 
a function, is redundant in a typing judgment where the same quantity can be 
inferred directly from the environments I’ and Q. A similar typing judgment of 
the form O; l; Q F, V : A is introduced for values, which are effect-less. 

The rules for deriving typing judgments are those in Figure where I, I> 
denotes the union of two contexts with disjoint domains. A well-formedness 
judgment of the form O F J means that all the free index variables occurring 
in I are in O. Well-formedness is lifted to types and typing contexts in the 
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Fig. 12. Proto-Quipper-R type system 
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natural way. Among interesting typing rules, we can see how the circ rule bridges 
between CRL and Proto-Quipper-R. A boxed circuit (€,C,k) is well typed with 
type Circ! (T, U) when C is no wider than the quantity denoted by I, C : Q > L 
and ¢,k contain all and only the labels in Q and L, respectively, acting as a 
language-level interface to C. 

The two main constructs that interact with circuits are apply and box. The 
apply rule is the foremost place where effects enter the type derivation: V repre- 
sents some boxed circuit of width at most J, so its application to an appropriate 
wire bundle W produces exactly a circuit of width at most J. The box rule, on 
the other hand, works approximately in the opposite direction. If V is a circuit 
building function that, once applied to an input of type T, would build a circuit 
of output type U and width at most J, then boxing it means turning it into a 
boxed circuit with the same characteristics. Note that the box rule requires that 
the typing context be devoid of linear variables. This reflects the idea that V 
is meant to be executed in complete isolation, to build a standalone, replicable 
circuit, and therefore it should not capture any linear resource (e.g. a label) from 
the surrounding environment. 


Wire Count Notice that many rules rely on an operator written #(-), which 
we call the wire count operator. Intuitively, this operator returns the number of 
wire resources (in our case, bits or qubits) represented by a type or context. To 
understand how this is important, consider the return rule. The return operator 
turns a value V into a trivial computation that evaluates immediately to V, and 
therefore it would be tempting to give it an effect annotation of 0. However, 
V is not necessarily a closed value. In fact, it might very well contain many 
bits and qubits, coming both from the typing context I" and the label context 
Q. Although nothing happens to these bits and qubits, they still corresponds 
to wires in the underlying circuit, and these wires have a width which must 
be accounted for in the judgment for the otherwise trivial computation. The 
return rule therefore produces an effect annotation of the form #(I; Q), which 
is shorthand for #(I’) +#(Q) and corresponds exactly to this quantity. A formal 
definition of the wire count operator on types is given in the following definition, 
which is lifted to contexts in the natural way. 


Definition 2 (Wire Count). We define the wire count of a type A, written 
#(A), as a function #(-): TYPE — INDEX such that 


#(1) = #(! A) = #(Circ’(T,U)) = 0, #(w) =1, 


#(A@B)=#(A)+#(B) #(A-rsB)=J,  #(List? A) = I x #(A). 


This definition is fairly straightforward, except for the arrow case. By itself, 
an arrow type does not give us any information about the amount of qubits or bits 
captured in the corresponding closure. This is precisely where the second index 
J, which keeps track exactly of this quantity, comes into play. This annotation 
is introduced by the abs rule and allows our analysis to circumvent data hiding. 
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The let rule is another rule in which wire counts are essential. The two terms 
M and N in let x = M in N build the circuits Cm and Cy, whose widths are 
bounded by J and J, respectively. Once again, it might be tempting to conclude 
that the overall circuit built by the let construct has width bounded by max(J, J), 
but this fails to take into account the fact that while M is building Cm starting 
from the wires contained in I, and Q1, we must keep aside the wires contained 
in Ip and Q2, which will be used by N to build Cy. These wires must flow 
alongside Cm and their width, i.e. #(I2; Q2), adds up to the total width of the 
left-hand side of the let construct, leading to an overall width upper bound of 
max(I + #(12; Q2), J). This situation is better illustrated in Figure 


Fig. 13. The shape of a circuit built by a let construct 


The last rule that makes substantial use of wire counts is fold, arguably the 
most complex rule of the system. The main ingredient of the fold rule is the 
bound index variable i, which occurs in the accumulator type B and is used to 
keep track of the number of steps performed by the fold. Let (-){J/i} denote 
the capture-avoiding substitution of the index term J for the index variable 7 
inside an index, type, context, value or term, not unlike (-)[V/z] denotes the 
capture-avoiding substitution of the value V for the variable x. Intuitively, if the 
accumulator has initially type B{0/i} and each application of the step function 
increases i by one, then when we fold over a list of length J we get an output 
of type B{I/i}. Index E is the upper bound to the width of the overall circuit 
built by the fold: if the input list is empty, then the width of the circuit is just 
the number of wires contained in the initial accumulator, that is, #(T'; Q). If the 
input list is non-empty, on the other hand, things get slightly more complicated. 
At each step i, the step function builds a circuit C; of width bounded by J, where 
J might depend on i. This circuit takes as input all the wires in the accumulator, 
as well as the wires contained in the first element of the input list, which are 
#-(A). The wires contained in remaining J — 1 -— i elements have to flow alongside 
Ci, giving a width upper bound of J + (I — 1 — i) x #(A) at each step i. The 
overall width upper bound is then the maximum for i going from 0 to I — 1 of 
this quantity, i.e. precisely max;<; J+ (1-1-1) x #(A). Once again, a graphical 
representation of this scenario is given in Figure 
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#(B{1/i}) 


#(B{2/i}) 


#(B{I-1/%}) 


#(A) #(B{I/i}) 


Fig. 14. The shape of a circuit built by a fold applied to an input list of type List? A 


Subtyping Notice that Proto-Quipper-R’s type system includes two subsump- 
tion rules, which are effectively the same rule for terms and values, respectively: 
csub and vsub. We mentioned that our type system resembles a refinement type 
system, and all such systems induce a subtyping relation between types, where 
A is a subtype of B whenever the former is “at least as refined” as the latter. In 
our case, a subtyping judgment such as 9, A <: B means that for all natural 
values of the index variables in O, A is a subtype of B. 


Ft Á = ee ire = — 5 ban _OFs A<:B 
OF,1<1 OF, w <: w 9- OF,!A<:!B 
Ot, Ai <: Ao O F, Bı <: Bo 
O F, Ai 8 Bı <: A2 Q Bo 


tensor 


O F, Ao <: Al O F, Bı <: Bo OFI, <I OF J = Jo 
O Fs At —1, Jı Bi <: Az 15,35 B2 


arrow 


OFsA<:B OFI=J 
Ok, List? A <: List? B 


list 


OF, T, <:>T2 OF,U, <:>Ug OEI<J 
O F, Cire’ (T1, U1) <: Circ” (T2, U2 


circ 


Fig. 15. Proto-Quipper-R subtyping rules 


We derive this kind of judgments by the rules in Figure Note that OF, 
A <:> B is shorthand for “O Fs A <: B and OF, B <: A”. Subtyping relies 
in turn on a judgment of the form O F I < J, which is a generalization of the 
semantic judgment that we used in the CRL type system in Section Such 
a judgment asserts that for all natural values of the index variables in ©, I is 
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lesser or equal than J. Consequently, © F I = J is shorthand for “OF I < J 
and OF J < I”. We purposefully leave the decision procedure of this kind of 
judgments unspecified, with the prospect that, from a more practical perspective, 
they could be delegated to an SMT solver [7]. 


4.4 Operational Semantics 


Operationally speaking, it does not make sense, in the Proto-Quipper languages, 
to speak of the semantics of a term in isolation: a term is always evaluated in 
the context of an underlying circuit that supplies all of the term’s free labels. 
We therefore define the operational semantics of Proto-Quipper-R as a big-step 
evaluation relation J} on configurations, i.e. circuits paired with either terms or 
values. Intuitively, (C, M) 4 (D,V) means that M evaluates to V and updates 
C to D as aside effect. 


(C, MIV/=) 4 (P.W) gea — C MIVA W/a)) LOX) 
(C, (Ava.M)V) 4 (D,W) (C, let (x,y) = (V,W) in M) 4 (D, X) 


(C,M) 4 (D, V) anol (E, q) = append(C, t, (£, D, k)) 
(C, force(lift MY) 4 (D, V) PPY C apply ((Z,D, k), D) Y (E, 0) 


(Q, 2) = freshlabels(T) (ido, M) 4 (idg,V) (ido, V2) 4 (D,k) 


force 


ik (C, boxr (lift M)) 4 (C, (2D, k)) 
tun a (CM) UEV) (E, NIV/a) 4 (P.W) 
(C, return V) 4 (C, V) (C, let x = M in N) 4 (D, W) 


Joont acid, V W)nil) J (CW) 
(C, M{0/i}) 4 (C,Y)  (C,Y V,W)) 4 (E,Z) 
(E, (fold; (lift M{i + 1/i}) Z) W^) 4 (D, X) 
4 


fold-step - 
(C, (fold; (lift M) V) (cons W W’)) 4 (D, X) 


Fig. 16. Proto-Quipper-R big-step operational semantics 


The rules for evaluating configurations are given in Figure where C, D 
and € are circuits, M and N are terms, while V, W, X,Y and Z are values. Most 
evaluation rules are straightforward, with the exception perhaps of apply, box 
and fold-step. Being the fundamental block of circuit-building, the semantics of 
apply lies almost entirely in the way it updates the underlying circuit. The con- 
catenation of the underlying circuit C and the applicand D is delegated entirely 
to the append function, which is given in Definition |4| Before we examine the 
append function, however, consider than when we deal with circuit objects we 
are not really interested in the concrete labels that occur in them, but rather 
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in the structure that they convey. For this reason, we introduce the following 
notion of circuit equivalence. 


Definition 3 (Circuit Equivalence). We say that two boxed circuits (€,C,k) 
and (t, D,q) are equivalent, and we write (L,C, k) = (t,D,q), when there exists 


a renaming p of labels such that p(€) = t, p(k) = q and p(C) =D. 


We can now move on to the definition of append, where the notion of circuit 
equivalence is used to instantiate the generic input interface of a boxed circuit 
with the actual labels that it is going to be appended to, and to ensure that there 
are no name clashes between the appended circuit and the underlying circuit. 


Definition 4 (append). We define the append of (€,D,k) to C on t, written 
append(C,t, (£, D, k)), as the function that performs the following steps: 


1. Finds (t, D',q) S (lD, k) such that the labels shared by C and D’ are all and 
only those in t, 

2. Computes E =C: D’, 

3. Returns (E, @). 


On the other hand, the semantics of a term of the form boxr/(lift M) relies 
on the freshlabels function. What freshlabels does is take as input a bundle type 
T and instantiate fresh Q, such that Q +, £: T. The wire bundle £ is then 
used as a dummy argument to V, the circuit-building function resulting from 
the evaluation of M. This function application is evaluated in the context of the 
identity circuit idg and eventually produces a circuit D, together with its output 
labels k. Finally, £ and k become respectively the input and output interfaces of 
the boxed circuit (l, D, k), which is the result of the evaluation of box: (lift M). 

Note, at this point, that T controls how many labels are initialized by the 
freshlabels function. Because T can contain indices (e.g. it could be that T = 
List? Qubit), it follows that in Proto-Quipper-R indices are not only relevant to 
typing, but they also have operational value. For this reason, the semantics of 
Proto-Quipper-R is well-defined only on terms closed both in the sense of regular 
variables and index variables, since a circuit-building function of input type, say, 
List’ Qubit does not correspond to any individual circuit, and therefore it makes 
no sense to box it. This aspect of the semantics is also apparent in the fold-step 
rule, where the index variable 7 occurring free in M is instantiated to 0 before 
evaluating M to obtain the step function Y. Then, before evaluating the next 
fold, į is replaced with 7+ 1 in M, increasing the index for the next iteration. 


5 Type Safety and Correctness 


Because the operational semantics of Proto-Quipper-R is based on configurations, 
we ought to adopt a notion of well-typedness which is also based on configura- 
tions. The following definition of well-typed configuration is thus central to our 
type-safety analysis. 
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Definition 5 (Well-typed Configuration). We say that configuration (C, M) 
is well-typed with input Q, type A, width J and output L, and we write Q F 
(C,M): A;I;L, whenever C : Q > L,H for some H such that 0;0;H Fe M : 
A;I. We write Q F (C,V): A; L whenever C : Q —> L,H for some H such that 
0:0; H Fo V:A. 


The three results that we want to show in this section are that any well-typed 
term configuration Q F (C, M) : A; I; L evaluates to some configuration (D, V), 
that Q F (D, V) : A; L and that D is obtained from C by extending it with a 
sub-circuit of width at most J. These claims correspond to the subject reduction 
and total correctness properties that we will prove at the end of this section. 
However, both these results rely on a central lemma and on the mutual notions 
of realization and reducibility, which we first give formally. 


Definition 6 (Realization). We define V lkg A, which reads V realizes A 


under Q, as the smallest relation such that 


— * IFg 1, 

— £ (Fz w, 

- V lko Ay, B iff J =|Q| and YW : W IFz A => VW IHf z B, 

— lift M IFg ! A iff MIF A, 

— (V,W) lFo,L AQ B iff V l-o A and W IFz B, 

— nil l-g List? A iff = I = 0, 

— cons V W lFor List’ A iff I =J +1 and V lko A and W IFz List” A, 

— (£,C,k) lkg Circ’ (T, U) ifC:Q—>LandQty&:T andLtyk:U and 
E width(C) < J. 


Definition 7 (Reducibility). We say that M is reducible under Q with type 
A and width I, and we write M I- A, if, for all C such that C : L > Q,H, 
there exist D,V such that 


1. (C,M) 4 (C: D, V), 
2. F width(D) < J, 
8. D: Q —> K for some K such that V l-g A. 


Both relations, and in particular reducibility, are given in the form of unary 
logical relations [55]. The intuition is pretty straightforward: a term is reducible 
with width J if it evaluates correctly when paired with any circuit C which 
provides its free labels and if it extends C with a sub-circuit D whose width is 
bounded by J. Realization, on the other hand, is less immediate. For most cases, 
realizing type A loosely corresponds to being closed and well-typed with type 
A, but a value realizes an arrow type A —o;,; B when its application to a value 
realizing A is reducible with type B and width J. 

By themselves, realization and reducibility are defined only on terms and 
values closed in the sense both of regular and index variables. To extend these 
notions to open terms and values, we adopt the standard approach of reasoning 
explicitly about the substitutions that would render them closed. A closing value 
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substitution y is a function that turns an open term M into a closed term y(M) 
by substituting a value for each free variable occurring in M. We say that y 
implements a typing context I” using label context Q, and we write y Fag I’, 
when it replaces every variable x; in the domain of I’ with a value V; such 
that Vi Ire, T(zx:) and Q = W,,caomcr) Qi- A closing index substitution 0 is 
similar, only it substitutes closed indices for index variables and can be applied 
to indices, types, contexts, values and terms alike. We say that 0 implements an 
index context ©, and we write 6 F O, when it replaces every index variable in 
O with a closed index term. This allows us to give the following fundamental 
lemma, which will be used while proving all other claims. 


Lemma 1 (Core Correctness). Let II be a type derivation. For all 0 F © and 
y Ego O(I), we have that 


Hv; T; LH. M: A;I => >(0(M)) IHG? 0(A), 
H>; r; LA, V:A = ¥(4(V)) IFe,r O(A). 
Proof. By induction on the size of J, making use of T heorem [I] 


Lemma [I] tells us that any well-typed term (resp. value) is reducible (resp. 
realizes its type) when we instantiate its free variables according to its contexts. 
Now that we have Lemma |1| we can proceed to proving the aforementioned 
results of subject reduction and total correctness. We start with the former, 
which unsurprisingly requires the following substitution lemmata. 


Lemma 2 (Index Substitution). Let I be a type derivation and let I be an 
index such that OF I. We have that 


TeO,il;QFeM: AJ = > OT {I/t};Q Fe M{I/i}: A{T/i}; J{I/i}, 
TreO,il;QtyV:A => 0; T{1I/i}; Q Fe V{T/i} : AȚI /i}. 


Proof. By induction on the size of I. 


Lemma 3 (Value Substitution). Let IT be a type derivation and let V be a 
value such that O; 8, T1; Qı Fy V : A. We have that 


I o O; 8, In, x : A; Q2 Fe M : B;I = > O; 8,1, I2;Q1, Q2 Fe M[V/zx] : B;I, 
Io O; 8, Ihn, x: A; Q2Fo W: B = > 0;8, I, I2;Q1, Q2 Fo W[V/z] : B. 


Proof. By induction on the size of JT. 


Theorem 2 (Subject Reduction). If Q | (C,M) : A;I;L and (C,M) 4 
(D,V), then QE (D, V): A; L. 


Proof. By induction on the derivation of (C, M) 4 (D,V) and case analysis on 
the last rule used in its derivation. Lemma B]is essential to the app,dest and let 
cases, while Lemma [2] is used in the fold-step case. Lemma [I] is essential to the 
box case, as it is the only case in which the side effect of the evaluation (the 
circuit built by the function being boxed), whose preservation is the a matter of 
correctness, becomes a value (the resulting boxed circuit). 
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Of course, type soundness is not enough: we also want the resource analysis 


carried out by our type system to be correct, as stated in the following theorem. 


Theorem 3 (Total Correctness). If Q + (C,M) : A;I;L, then there exist 
D,V such that (C,M) 4 (C :: D, V) and F width(D) < I. 


Proof. By definition, Q F (C,M) : A; I; L entails that C : Q > L,H and 
0:0; H He M : A; I. Since an empty context is trivially implemented by an 
empty closing substitution, by Lemma [I] we get M IH, A, which by definition 
entails that there exist D, V such that (C, M) 4 (C :: D, V) and F width(D) < I. 


6 A Practical Example 


This section provides an example of how Proto-Quipper-R can be used to verify 
the resource usage of realistic quantum algorithms. In particular, we use our 
language to implement the QFT algorithm [11]39] and verify that the circuits it 
produces have width no greater than the size of their input, i.e. that the QFT 
algorithm does not overall use additional ancillary qubits. 


aft = fold; qftStep nil 
aftStep = lift(return A( gs, q} List Qubit@Qubit- 
let (n, gs) = glen qs in 
let revQs = rev qs in 
let (q, gs) = (folde (lift(rotate n)) (q, nil)) revQs in 
let q = apply(H, q) in 


return (cons q qs)) 


rotate = Annat.return A((q, CS), C) (Qubit@List© Qubit) @Qubit- 
let (m, cs) = qlen cs in 
let rgate = makeRGate (n + 1 — m) in 
let (q, c) = apply(rgate, (q,c)) in 
return (q, cons c cs) 


Fig. 17. A Proto-Quipper-R implementation of the Quantum Fourier Transform circuit 
family. The usual syntactic sugar is employed. 


The Proto-Quipper-R implementation of the QFT algorithm is given in Figure 
As we walk through the various parts of the program, be aware that we 
will focus on the resource aspects of the algorithm, ignoring much of its actual 
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meaning. Starting bottom-up, we assume that we have an encoding of naturals 
in the language and that we can perform arithmetic on them. We also assume 
some primitive gates and gate families: H is the boxed circuit corresponding to 
the Hadamard gate and has type Circ’(Qubit, Qubit), whereas the makeRGate 
function has type Nat —o,0 Circ? (Qubit @ Qubit, Qubit ® Qubit) and produces 
instances of the parametric controlled R, gate. On the other hand, glen and 
rev stand for regular language terms which implement respectively the linear 
list length and reverse functions. They have type qlen : List’ Qubit —o; 9 (Nat ® 
List’ Qubit) and rev : List’ Qubit 49 List’ Qubit in our type system. 

We now turn our attention to the actual QFT algorithm. Function qftStep 
builds a single step of the QFT circuit. The width of the circuit produced at step 
j is dominated by the folding of the rotate n function, which applies controlled 
rotations between appropriate pairs of qubits and has type 


(Qubit Q List® Qubit) @ Qubit —o,49,9 Qubit @ List®*? Qubit, (4) 


meaning that rotaten rearranges the structure of its inputs, but overall does 
not introduce any new wire. We fold this function starting from an accumulator 
(q, nil), meaning that we can give fold, (lift(rotate n)) (q, nil) type as follows: 


i,j,e;n: Nat; Ø F, lift(rotate n) : !((Q 8 Listë Q) @ Q —e+2,0 Q Q List®*? Q) 
i, jiq : Q;Ø Ho (g nil) : QO List? Q Tin ijh Q 
i, jin : Nat, q : Q; Ø Fy folde lift(rotate n) (q, nil) : List? Q —;j41,1 QQ List? Q 
(5) 
where Q is shorthand for Qubit and where we implicitly use the fact that i, j 
max(1, maxe<je +2 + (j —1—e) x1) = j +1 to simplify the arrow’s width 
annotation using vsub and the arrow subtyping rule. Next, we fold over revQs, 
which has the same elements as qs and thus has length j, and we obtain that 
the fold produces a circuit whose width is bounded by j + 1. Therefore, gftStep 
has type 


fold 


\((List? Qubit @ Qubit) 0; 41,9 List?*+ Qubit), (6) 


which entails that when we pass it as an argument to the topmost fold together 
with nil we can conclude that the type of the qft function is 


i, j;0;0 Fo gftStep : !((List? Qubit @ Qubit) 0; 41,9 List?+* Qubit) 
i;0;0 Fy nil : List? Qubit iki i  Qubit 


fold i i 
i; 0; Ø Fo fold; gftStep nil : List’ Qubit —o; o List’ Qubit 


(7) 


where we once again implicitly simplify the arrow type using the fact that i 
max(0, maxj<i j + 1+ (i— 1-— j) x 1) = i. This concludes our analysis and the 
resulting type tells us that qft produces a circuit of width at most ¿i on inputs 
of size i, without overall using any additional wires. If we instantiate 7 to 3, for 
example, we can apply qft to a list of 3 qubits to obtain the circuit shown in 
Figure [I8] whose width is exactly 3. 

To conclude this section, note that for ease of exposition qft actually pro- 
duces the reversed QFT circuit. This is not a problem, since the two circuits are 
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Fig. 18. The circuit of input size 3 produced by gft (cons qi cons q2 cons q3 nil) 


equivalent resource-wise and the actual QFT circuit can be recovered by boxing 
the result of gft and reversing it via a primitive operator. Besides, note that 
Quipper’s internal implementation of the QFT is also reversed [16]. 


7 Related Work 


The metatheory of quantum circuit description languages, and in particular of 
Quipper-style languages, has been the subject of quite some work in recent 
years, starting with Ross’s thesis on Proto-Quipper-S [48] and going forward with 
Selinger and Rios’s Proto-Quipper-M [46]. In the last five years, some proposals 
have also appeared for more expressive type systems or for language extensions 
that can handle non-standard language features, such as the so-called dynamic 
lifting |8]21J35], available in the Quipper language, or dependent types [22]. 
Although some embryonic contributions in the direction of analyzing the size of 
circuits produced using Quipper have been given [56], no contribution tackles 
the problem of deriving resource bounds parametric on the size of the input. In 
this, the ability to have types which depend on the input, certainly a feature of 
Proto-Quipper-D [22], is not useful for the analysis of intensional attributes of 
the underlying circuit, simply because such attributes are not visible in types. 

If we broaden the horizon to quantum programming languages other than 
Quipper, we come across, for example, the recent works of Avanzini et al. [5] 
and Liu et al. [36] on adapting the classic weakest precondition technique to 
the cost analysis of quantum programs, which however focus on programs in an 
imperative language. The work of Dal Lago et al. [I3] on a quantum language 
which characterizes complexity classes for quantum polynomial time should cer- 
tainly be remembered: even though the language allows the use of higher-order 
functions, the manipulation of quantum data occurs directly and not through 
circuits. Similar considerations hold for the recent work of Hainry et al. and 
Yamakami’s algebra of functions [59] in the style of Bellantoni and Cook [6], 
both characterizing quantum polynomial time. 

If we broaden our scope further and become interested in the analysis of 
the cost of classical or probabilistic programs, we face a vast literature, with 
contributions employing a variety of techniques on heterogeneous languages and 
calculi: from functional programs [2[32]33] and term rewriting systems [34I] 
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to probabilistic [34] and object-oriented programs [19]28]. In this context, the 
resource under analysis is often assumed to be computation time, which is rela- 
tively easy to analyze given its strictly monotonic nature. Circuit width, although 
monotonically non-decreasing, evolves in a way that depends on a non-monotonic 
quantity, i.e. the number of wires discarded by a circuit. As a result, width has 
the flavor of space and its analysis is less straightforward. 

It is also worth mentioning that linear dependent types can be seen as a 
specialized version of refinement types [18], which have been used extensively in 
the literature to automatically verify interesting properties of programs [87162]. 
In particular, the work of Vazou et al. on Liquid Haskell [5758] has been 
of particular inspiration, on account of Quipper being embedded precisely in 
Haskell. The liquid type system [47] of Liquid Haskell relies on SMT solvers 
to discharge proof obligations and has been used fruitfully to reason about both 
the correctness and the resource consumption (mainly time complexity) of con- 
crete, practical programs [80]. 


8 Generalization to Other Resource Types 


This work focuses on estimating the width of the circuits produced by Quipper 
programs. This choice is dictated by the fact that the width of a circuit cor- 
responds to the maximum number of distinct wires, and therefore individual 
qubits, required to execute it. Nowadays, this is considered as one of the most 
precious resources in quantum computing, and as such must be kept under con- 
trol. However, this does not mean that our system could not be adapted to the 
estimation of other parameters. This section outlines how this may be possible. 

First, estimating strictly monotonic resources, such as the total number of 
gates in a circuit, is possible and in fact simpler than estimating width. A sin- 
gle index term J that measures the number of gates in the circuit built by a 
computation would be enough to carry out this analysis. This index would be 
appropriately increased any time an apply instruction is executed, while sequenc- 
ing two terms via let would simply add together the respective indices. 

If we were instead interested in the depth of a circuit, then we would need 
a slightly different approach. Although in principle it would be possible to still 
rely on a single index J, this would give rise to a very coarse approximation, 
effectively collapsing the analysis of depth to a gate count analysis. A more pre- 
cise approximation could instead be obtained by keeping track of depth locally. 
More specifically, it would be sufficient to decorate each occurrence of a wire 
type w with an index term J so that if a label £ were typed with w7, it would 
mean that the sub-circuit rooted in @ has a depth at most equal to I. 

Finally, it should be mentioned that the resources considered, i.e. the depth, 
width, and gate count of a circuit, can be further refined so as to take into 
account only some kinds of wires and gates. For instance, one could want to 
keep track of the maximum number of qubits needed, ignoring the number of 
classical bits, or at least distinguishing the two parameters, which of course have 
distinct levels of criticality in current quantum hardware. 
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9 Conclusion and Future Work 


In this paper we introduced a linear dependent type system based on index re- 
finements and effect typing for the paradigmatic calculus Proto-Quipper, with 
the purpose of using it to derive upper bounds on the width of the circuits pro- 
duced by programs. We proved not only the classic type safety properties, but 
also that the upper bounds derived via the system are correct. We also showed 
how our system can verify a realistic quantum algorithm and elaborated on some 
ideas on how our technique could be adapted to other crucial resources types, 
like gate count and circuit depth. Ours is the first type system designed specifi- 
cally for the purpose of resource analysis to target circuit description languages 
such as Quipper. Technically, the main novelties are the smooth combination of 
effect typing and index refinements, but also the proof of correctness, in which 
reducibility and effects are shown to play well together. 

Among topics for further work, we can identify three main research direc- 
tions. First and foremost, it would be valuable to investigate the ideas presented 
in this paper from a more practical perspective, that is, to provide a prototype 
implementation of the language with its type-checking procedure. The neces- 
sity to count the wires present in the context (e.g. when typing abstractions) 
makes it difficult to embed Proto-Quipper-R into existing languages, even those 
that, in principle, seem like ideal hosts, like Liquid Haskell or Granule 
[42]. Because of this, we think that it would be better to produce a standalone 
implementation of Proto-Quipper-R that interfaces directly with SMT solvers to 
discharge the semantic judgments that are used pervasively in the typing rules. 


Staying instead on the theoretical side of things, on one hand we have the 
prospect of denotational semantics: most incarnations of Proto-Quipper are en- 
dowed with categorical semantics that model both circuits and the terms of 
the language that build them [21]22]35]46]. We already mentioned how the in- 
tensional nature of the quantity under analysis renders the formulation of an 
abstract categorical semantics for Proto-Quipper-R and its circuits a nontrivial 
task, but we believe that one such semantics would help Proto-Quipper-R fit 
better in the Proto-Quipper landscape. 


On the other hand, in Section [8] we briefly discussed how our system could be 
modified to handle the analysis of different resource types. It would be interesting 
to test this path and to investigate the possibility of actually generalizing our 
resource analysis, that is, of making it parametric on the kind of resource being 
analyzed. This would allow for the same program in the same language to be 
amenable to different forms of verification, in a very flexible fashion. 
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Abstract. In this paper, we study quantitative properties of quantum 
programs. Properties of interest include (positive) almost-sure termina- 
tion, expected runtime or expected cost, that is, for example, the ex- 
pected number of applications of a given quantum gate, etc. After study- 
ing the completeness of these problems in the arithmetical hierarchy over 
the Clifford+T fragment of quantum mechanics, we express these prob- 
lems using a variation of a quantum pre-expectation transformer, a weak- 
est pre-condition based technique that allows to symbolically compute 
these quantitative properties. Under a smooth restriction—a restriction 
to polynomials of bounded degree over a real closed field—we show that 
the quantitative problem, which consists in finding an upper-bound to 
the pre-expectation, can be decided in time double-exponential in the 
size of a program, thus providing, despite its great complexity, one of 
the first decidable results on the analysis and verification of quantum 
programs. Finally, we sketch how the latter can be transformed into an 
efficient synthesis method. 


1 Introduction 


Motivations. Quantum computation is a promising and emerging computa- 
tional paradigm which can efficiently solve problems considered to be intractable 
on classical computers [41,20]. However, the unintuitive nature of quantum me- 
chanics poses challenging questions for the design and analysis of corresponding 
quantum programming. Indeed, the quantum program dynamics are consider- 
ably more complicated compared to the behavior of classical or probabilistic 
programs. Therefore, formal reasoning requires the development of novel meth- 
ods and tools, a development that has already started and recently gathered 
momentum in various areas, like design automation [43,22], programming lan- 
guages [39,2,31,23,15], verification [36,11], etc. 

Among these formal methods, those that allow us to obtain quantitative 
properties on quantum programs are particularly interesting. They can be used 
to obtain relevant information about the computations of a quantum program, 
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RUS ê iY =o; 

x8 = tt; 

while x do { 
q2 = |0) ; 
q2 *= H; 
q2 *=T; 
i=i+1 ly) 
q2, qı *= CNOT; 
q2 *= H; lo.) -HHT 
q2, qı *= CNOT; 
q2 x= T; 
i=i+1 
q2 *= H; 


— O 
3 
oY 
a 
x 


meas x 


x = meas q2 


Fig. 1. Repeat-until-success program RUS and step-circuit. 


such as the number of qubits used and the number of unitary operators used, 
thus enabling the corresponding compiled quantum circuit to be optimized (for 
example, by minimizing the use of gates that are hard to make fault-tolerant, 
or by reducing the number of qubits) or to avoid undesirable behavior such 
as non-termination. Another quantitative property of interest may also be the 
question whether or not a program terminates almost-surely, that is, whether its 
probability of non-termination is zero or not. Similarly, we could aim to capture 
the expected values of (classical) program variables upon program termination. 
The latter can also be employed to reason about the expected runtime or the 
expected cost of quantum programs, if we suitably instrument the code with 
counter variables. 

To illustrate this, the program of Figure 1 implements a Repeat-Until-Success 
algorithm that can be used to simulate quantum unitary operators on input qubit 
qi by using repeated measurements. The quantum step-circuit on the right part 
corresponds to one iteration of the loop. Variable i in the program just acts as a 
counter for T-gates. Hence an analysis on the expected value of variable i can be 
used to infer an upper-bound on the expected T-count, i.e., the expected number 
of times a T-gate is used in the fully compiled quantum circuit. Such an approach 
offers the advantage to allow the programmer to implement quantum programs 
using fewer T-gates, which are costly to implement fault-tolerantly [10,16], and 
it therefore provides a simple quantum program to illustrate that the study of 
quantitative properties is paramount. 

In [6,30], new methodologies named quantum expectation transformers based 
on predicate transformers [13,28] and expectation transformers [32,17] have been 
put forward to naturally express and study the quantitative properties of quan- 
tum programs. However, no attempt was made to automate the corresponding 
techniques or delineate how complicated such an automation could be. Automa- 
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tion of these formal verification techniques in the context of quantum programs 
is a particularly difficult problem. Indeed, the consideration of Hilbert spaces 
as a mathematical framework for describing principles and laws of quantum 
mechanics makes it seemingly impossible to reason fully automatically about 
quantitative properties of quantum program: they involve computational ob- 
jects of exponential dimensions (in the number of qubits) with scalars ranging 
over an uncountable domain (i.e., complex numbers C). This problem is directly 
linked to the fact that the set C includes non-computable numbers [42] and that 
testing the inequality < or the equality = of two real numbers is not decidable, 
even if one restricts their study to computable real numbers. Consequently, the 
particular nature of quantum programs and of their semantic domain, Hilbert 
spaces, makes it impossible to directly apply the results obtained in the classical 
and probabilistic setting [37,24]. 


Contributions. In this paper, we study the hardness of the quantitative prop- 
erties of mixed classical-quantum programs and provide a first step towards their 
(full) automation using quantum expectation transformers. 

To this end, we restrict the considered quantum gates to the Clifford+T frag- 
ment, which is known to be the simplest approximately universal fragment of 
quantum mechanics [1]. Clifford+T makes it possible to only consider quantum 
states with algebraic amplitudes, thus restricting the study to a countable do- 
main. It implies that our results can accommodate quantum gates employed in 
actual hardware, recently employed to claim quantum advantage, cf [3]. More- 
over, the obtained results are very general as it can be extended to any set of 
gates with algebraic coefficients. 

As motivated, our first contribution is about the general hardness of deciding 
quantitative properties for mixed classical-quantum programs. For a given input 
state, we study properties such as (positive) almost-sure termination, (P)AST 
for short; testing problems, TESTR, which consist in comparing a quantum ex- 
pectation (for example, the mean value of a variable) with a given value (an 
algebraic and positive real number) wrt the relation R; and the finiteness prob- 
lem, TESTz.., which consists in checking that a quantum expectation is finite. 
For each of those problems, we also study the related universal problem, which 
consists in checking the corresponding property for every input. We establish a 
precise mapping (Theorem 1) of the inherent complexity of each problem in the 
arithmetical hierarchy [34] that is summarized in Table 1 (provided in Section 3). 
E.g., AST is IZ9-complete while PAST is ©'$-complete. 

Our second contribution aims to overcome the aforementioned undecidability 
results. For that, we study approximations. More precisely, we focus on infer- 
ring bounding functions (in general depending on the input) on the expected 
values of classical program variables upon termination. The decision problem 
has thus been altered to an inference problem. Further, we restrict the set of 
potential bounding functions. As a suitable class of functions, we consider poly- 
nomials over the real-closed field of the algebraic numbers. The restriction to 
algebraic numbers guarantees that comparison operations between real num- 
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bers remain decidable. On the other hand, for any real closed field, quantifier 
elimination for formulas over polynomials is decidable, that is, there exists a 
double-exponential algorithm computing a quantifier-free formula equivalent to 
the original formula [21]. This recasting of the problem and restriction of the 
solution space suffices to render the problem decidable. The inference algorithm 
established remains double-exponential (Theorem 4), thus of similar complexity 
as the underlying quantifier elimination procedure. 

Finally, our last contribution (Section 5) studies effective automation of the 
inference of upper bounds on the expected values of program variables. To im- 
prove upon the double-exponential complexity, we further restrict the class of 
polynomials considered, that is, to degree-2 polynomials and sketch how tech- 
niques from optimization theory can be employed. Several simple quantum algo- 
rithms such as program RUS can be analyzed using this approach (Example 6). 
This further reduction in expressivity allows the encoding of the problem in SMT 
and thus paves the way towards (full) automation. 


Related Work. Predicate transformers [13,28]—on which our work is based— 
were introduced as a method for reasoning about the semantics of imperative pro- 
grams. They have been adapted to the probabilistic setting, leading to the notion 
of expectation transformer [32,17], which has been used to reason about expected 
values [26,8], runtimes [27,33], and costs [7,4,33], and to the quantum paradigm, 
leading to the notion of quantum pre-expectation transformer [35,30,6]. 

The problem of studying the difficulty of analyzing quantitative program 
properties has been deeply studied in the classical setting. To mention a few, 
[14] and [37] study termination properties and runtime/derivational properties 
of first-order programs, respectively. Further, in [24] completeness results for 
various quantitative properties of (pure) probabilistic programs have been estab- 
lished. The inference problem of expectation transformers, i.e., establishing an 
implementation that automates the search for pre-expectations, has been stud- 
ied extensively. Examples of successful implementation are presented in [33,7,8]. 
Up to now, however, no practical, feasible studies have been carried out on quan- 
tum languages. Among the techniques using quantum expectation transformers, 
we believe [6] to be the most amenable to automation. Indeed, by lifting up- 
per invariants of [27] to the quantum setting, it enables approximate reasoning 
and eliminate the need to reason about fixpoints or limits, stemming from the 
semantics of loops. 


2 Quantum Programming Language 


In this section, we introduce the syntax and operational semantics of the con- 
sidered mixed-quantum imperative programming language. 


Syntax. We make use of three basic datatypes B, M and Q for Boolean, num- 
bers (non-negative integers), and qubit data, respectively. Let K be an arbitrary 
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NExp Ə n, ni, ne =x |n N |n: +n2 | nı — nə | nı X ng 
BExp > b, bi, b2 n= x? | tt | ff | nı = n2 | m < no | =b | b1 A b2 | bi V b2 
Exp D e, e1, €2 n=n|b 
Stmt Ð stm, stmı, stm2 ::= skip | x* = eX | stm; stmə | if b” then stm, else stm. 


| while b” do stm | q2 *= U | x? = meas q? 


Fig. 2. Syntax of quantum programs. 


classical type in {B, M}. Each program variable comes with a fixed datatype and 
can be optionally annotated by its type as a superscript. In what follows, we will 
use x,x’,y,... to denote classical variables of type K and q,q’,... to denote 
quantum variables of type Q. A program, denoted P, is simply a statement; see 
Figure 2. Program statements are either classical assignments, conditionals, se- 
quences, loops, quantum assignments qe x= U, or measurements x’ = meas q2. 
A quantum assignment consists in the application of a quantum unitary gate 
U of arity ar(U) to a sequence of qubits g ê q1,.. -,Gar(v). As we will see in 
the semantics section, a unitary matrix U will be associated with each quantum 
gate U. A measurement performs a single qubit measurement of q in the com- 
putational basis: the outcome is a Boolean value and the quantum state evolves 
accordingly. For a given syntactic construct t, let B(t) (respectively N (t), Q(t)) 
be the set of Boolean (respectively, number, qubit) variables in t. 

Notice that the language encompasses qubit-initializing in the basis states. In 
particular, we will use q = |0} as syntactic sugar for x = meas q; if x then q *= 
X else skip, for X being the Pauli X gate and for some fresh variable x of type 
B. 


Example 1. Consider the program of Figure 3, adapted from [6], as a simple 
leading example. Let H be the unitary operator computing the Hadamard gate. 
This program simulates coin tossing by repeatedly measuring the qubit q, until 
the measurement outcome ff occurs. The probability to terminate within n steps 
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a+6=1 and y = 8) of the qubit q. Variable i is increased by one at each 
iteration, and hence, when the program terminates, i stores as final value the 
number of loop iterations performed. The overall probability of termination is 
1. The mean value of variable i, that is, the expected number of loop iterations, 
depends on the program input, in particular on the initial quantum state. After 


depends on the initial state p = ( ) (a density matrix in C?*?, which implies 


termination, for an initial state p = (3 7 ) its expected value is given by 


F(p) = po x 140 +) = po +pı +2pı = 1+(a—B-—B+6) =2- (8+8), 


atBtptS iets 


where po = and pı = 1— po are the probabilities of measuring 
|0} and |1), D A, on ‘the first iteration of the loop. For instance, for a qubit 
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Cntoss ê x8 = tt; 
iN =Ó; 


while x do { with H = 1 G %3) 


i=i+1; 
q? *=H; } £ stm 


x = meas q 


} 


Fig. 3. Quantum Coin tossing 


initialized in state |} = y 1/3 |0} + \/2/3|1), the corresponding density matrix 
is pig) = |oX¢| = (a a °) and hence the expected number of loop iterations 


is F(p\g)) = 2 — 2v2/s. It will be simply 2 in the case of an initialization in the 
computational basis |¢) = |0) or |¢) = |1). 


Operational Semantics. Following [6], we model the dynamics of our language 
as a probabilistic abstract reduction system (see [9]), a transition system where 
reduction is defined as a relation over probability distributions. 


Probabilistic abstract reduction systems. Given a subset K of R, let K+ be the 
set of non-negative numbers in K, i.e., Kt = KN {x | x > 0} and let K® be 
defined by K® £ K U {oo}. 

A discrete (sub) distribution ô over a set A is a function ô : A — [0,1] with 
countable support supp(ô) = {a € A | 6(a) # 0} that maps an element a of 
A to a probability 5(a) such that |6| = J acsupp(a) 20) = 1 (|| < 1). Any 
(sub)distribution ô can be written as {8 (a) : a}aesupp(s)- The set of subdistribu- 
tions over A, denoted by D(A), is closed under denumerable convex combinations 
pi ĝi = Aa. $; pi6;(a), with p; € [0,1] and >>, p; < 1. Slightly simplifying 
standard notation, given f : A + R*t© and a subdistribution 6 € D(A), we de- 
fine Es(f), the expectation of f on ô, by Es(f) = Naesupp(s)6(a) f(a). Note that 
is(f) € R*© is always defined, since the images of f are non-negative reals. 

Bournez and Garnier [9] introduced the notion of Probabilistic Abstract Re- 
duction System (PARS) as a means to study reduction systems that evolve 
probabilistically. A PARS — on A is a binary relation - > - C A x D(A). The 
intended meaning is that when a — 6, then a reduces to b € supp(d) with prob- 
ability 6(b). Here, we focus on deterministic PARSs, i.e., PARSs — with a > 6; 
and a — 62 implies 6, = 69. An object a € A is called terminal if there is no 
rule a — 6, which we write as a A. 

Every deterministic PARS — over A naturally lifts to a reduction relation 
— over distributions so that ô —» €, if the reduct distribution € is obtained 
from ô by replacing reducts in supp(6) according to the PARS —. In fact, we 


define this lifting in terms of a ternary relation -—» - C D(A) x R* x D(A) on 
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distributions, where in a step 6 5 £ the weight c signifies the probability that 
a reduction has occurred. This relation is defined wrt. the following three rules. 


af ao jee Divi sl 


{1: a} S {1: a} {l:a} 6 Soo ee 


We may sometimes use the n-fold (n > 0) composition of +>, denoted >>”, given 
by 5 »” «e if 6 4» --. *» e and the weights satisfy c = X; ci Notice that 
since — is deterministic, so is 5, in the sense that 6 » €; and 6 =» ez implies 
Cy = Cg and é, = €. Thus, in particular, for every a € A there is precisely one 
(infinite) reduction 


{1: a} = do Sy S Ay So By Sa eo. 


For any b € A, 6;(b) gives the probability that a reduces to b in ¿į steps. Note 
that when b is terminal, this probability only increases along reductions (i.e., 
6i(b) < ði+1(b) for all i). This justifies that we define the terminal distribution 
of a as the distribution 6(b) £ lim; 6;(b). Note that 6(b) gives the probability 
that a reaches b in an arbitrary (but finite) number of steps. Since the weights 
ci indicate the probability that a step has been performed from 6; to 6;41, the 
infinite sum J`; o ci € R*° gives the expected number of reduction steps carried 
out, the expected derivation length of a [5]. 

For a PARS —, we denote by term_, : A + D(A) the function associating 
with each a € A its terminal distribution. The expected derivation length function 
edl, : A + RT associates each a € A to its expected derivation length. The 
PARS — is almost surely terminating [40] (a.s. terminating for short) if a € A 
reduces to a terminal object b A with probability 1, that is, if |term_,(a)| = 1 
for every a. It is positive almost surely terminating, if the expected derivation 
length is always finite, that is, edl_,(a) < co for alla € A. 

Apart from termination, we are interested also in questions related to func- 
tional correctness, such as (i) what is the probability that a reaches a terminal 
b, (ii) what is the probability that a reaches a terminal satisfying predicate P, 
and more generally, (iii) which value does a function f : A > Rt® take, in 
expectation, when fully reducing an object a. In the literature [32], one tool to 
answer all of these are given by weakest pre-expectation transformers, the natural 
generalization of classical weakest pre-condition transformers to a quantitative, 
probabilistic setting. We suite this notion to PARSs. 


Definition 1 (Weakest pre-expectation). The weakest pre-expectation for 
a PARS — over A is given by the function 


wp, : (A> R+) > (A> R+) 
WP, 4 Af.Aa. cterm_, (a) (f). 


For 1, the indicator function evaluating to 1 on argument b and to 0 other- 
wise, and by seeing a predicate P as a 0, 1-valued function, wp_, 1, a answers 
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Ski Ex 
aaa a e aa =eba 
O 
(q *=U,s,p) 9 {1 : (Q, s, Pu, (p))} oe 
(Meas) 
(x = meas qi, s, p) >a {tr(Mk,ip) : (L, s[x = k], mx,i(p)) }eefo,1} 
(stm, s, p) —a {pi : Peas yee (Seq) 
(stmı; stm2, s, pP) — {pi : (stm}; stm2, 8°, p) Jier 
[elt € (0.0 (Cond) 
(if b then stm, else stmo, $, pP) o {1 : (stmpy<,s, p)} 
[o] = 0 
l (Who) 
(while b do stm, s, p) >x {1 : (J, 5,p)} 
b] =1 
(Wh) 


(while b do stm, s, p) —»x {1 : (stm; while b do stm, s, p)} 


Fig. 4. Operational semantics in terms of PARS. 


question (i), wp_, P a answers (ii), and generally wp_, f a answers question (iii). 
Note also that a PARS is a.s. terminating iff wp_, (àb. 1) a = 1 for each a € A. 
On the other hand, positive a.s. termination cannot be expressed through an 
application of wp_,. 


Quantum programs as PARSs. We now endow quantum programs with an op- 
erational semantics defined in terms of a PARS. Given a totally ordered set of 
qubits Q = {qi,...,dn}, let Ha be the 2”-dimensional Hilbert space defined by 
Ha = @'_\Hg,, with Hg = C? being the vector space of computational basis 
{|0) ,|1)} and ® being the tensor product. With (k| we denote the transpose con- 
jugate of |k}, for k € {0,1}. Let M(Hq) be the set of complex square matrices 
acting on the Hilbert space Hg, i.e., M(Hg) = C?”*?". Given M € M(H), 
Mt denotes the transpose conjugate of M, and Ign denotes the identity matrix 
over M(Hg). We will write J when the dimension is clear from the context. 

Let D(Hag) € M(Ha) be the set of all density operators (or quantum states), 
i.e., positive semi-definite matrices of trace equal to 1 on Hg. Density operators 
can be viewed as the mathematical representation of a (mixed) quantum state. 
A unitary operator U is a matrix in M(Hg) such that UUt = UU = I. A 
superoperator By : D(Hg)  D(H@g), an endomorphism over density opera- 
tors, is attached to each unitary operator U and defined by y = Ap.UpUt. 
By definition, y is a completely positive trace preserving linear map. Indeed, 
tr(UpU") = tr(p), by unitarity. Hence UpU' is a density operator in D(Hg) for 
each p E D(HQ). 
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Regarding measurements, for each i, 1 < i < card(Q), we define Mpi € 
M(Ha), with k € {0,1}, by Mo,i = Ini-1 Q (|0) (0|) @ I2n-: and My, = I — Mo, 
The measurement of the qubit q; (in the computational basis) of a density matrix 
p E D(Hq), produces the classical outcome k € {0,1} with probability tr (Mk, ip). 
The (normalized) quantum state, after the measurement, is defined by 


Mr ipM} , . 

OE {re if tr(Ma,sp) # 0. 

= otherwise. 

Note that for all p € D(Ha), Mmk alp) € D(H), as it holds that tr(mszi(e)) = 1. 
Indeed, tr(My ipM} D = tr(M@ p) = tr(Mp ip), as Mp, is a projection. Hence 
Mri is a Map in D(Hq) + D(H). 

We set [B] = {0,1} and [NV] £ N. The classical state is modeled as a (well- 
typed) store s of domain dom(s) mapping each variable x of type K to a value 
in [K]. With Store, we denote the set of all such stores. Let s[x := k] with 
k € [K] be the store obtained from s by updating the value assigned to x in 
the map s. Given a store s, let [—]* : KExp > [X] be the map associating to 
each expression e of type K and such that B(e) UN (e) C dom(s), a value in 
[KX], defined in the obvious way. For example [x]* £ s(x), [n] £ n, [tt]® + 1, 
[ni — n2] £ max(0, [n1]: — [n2]°), etc. 

Let | be a special symbol for termination. A configuration u, for (extended) 
statement stm € Stmt U {|}, store s € Store, and a quantum state p E€ Ha, 
has the form (stm, s,). Let Conf be the set of configurations. A configuration 
(stm, s, p) is well-formed with respect to the sets of variables B, V, and Q if 
B(stm) C B, N(stm) C V, Q(stm) C Q, dom(s) = BU V, and p E€ D(HaQ). 
Throughout the paper, we only consider configurations that are well-formed 
with respect to the sets of variables of the program under consideration. 

The operational semantics is described in Figure 4 as a PARS — over ob- 
jects in Conf, where terminal objects are precisely the configurations of the 
shape ({,s,p). The (classical or quantum) state of a configuration can only 
be updated by the three rules (Exp), (Op), and (Meas). Rule (Exp) updates 
the classical store wrt the value of the evaluated expression. Rule (Op) up- 
dates the quantum state to a new quantum state ®y,(p) = UspU}, where 
Ug is the unitary operator in M(Hg) computed by extending the quantum 
gate U to the entire set of qubits Q. Rule (Meas) performs a measurement on 
qubit q;. This rule returns a distribution of configurations corresponding to the 
two possible outcomes, k = 0 and k = 1, with their respective probabilities 
tr(Mk ip) and, in each case, updates the classical store and the quantum state 
accordingly. In the particular case where tr(Mko ip) = 0 for some kọ € {0,1}, 
{tr(Mrap) : (L slx = k], map) Fuegony = 11: Ch sx = 1 — kol, mi-ro,i(P))}. 
Rule (Seq) governs the execution of a sequence of statements stm,; stm2, under 
the covenant that | ; stm = stm, for each statement stm. The rule accounts 
for potential probabilistic behavior when stm, performs a measurement and it 
is otherwise standard. All the other rules are standard. 
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In a configuration u = (stm, s, p), the pair o = (s, p) is called a state. Let 
St**™ be the set of states o,7,... that are well-formed wrt statement stm. For 
simplicity, we will denote this set by St when stm is clear from the context. To 
ease the presentation, we sometimes write (stm, o) for the configuration u. 

We will be interested in expectation-based reasoning on quantum programs. 
In what follows, we also call functions f : Conf + Rt expectations, for brevity. 


Definition 2. For a statement stm and f : St — Rt™, we overload the notions 
of expected derivation length and weakest pre-expectation by: 


edl.en : St 3 Rt WP. : (St > Rt) > (St > R*™) 
edl een = ro.edl_,, (stm, a) WP stm — Af.Ao.wp_,, (fst) (stm, o), 
where falstm, T) = f(T). 


Example 2. Consider the program Cntoss given Figure 3. In the setting of the 
program Cntoss, Q = {q}, Moi = (4 8 )and Mı, = (8 9) On an initial state o = 
(s, p), the reduction starts deterministically as in the classical setting, performing 
the initialization x = tt and i = 0. From there, evaluation reaches the loop 
while x do stm. At each loop iteration, the loop counter i is incremented, 
and the Hadamard gate applied to the quantum variable q. The loop guard is 
obtained through measuring q. 

To see how this is reflected in the semantics, let us first look at an iteration 
of the loop. If x was set to false, that is x holds the value 0, by rule (Who) the 
loop terminates within one step: 


{1 : (while x do stm, [x:=0, i:=i], p)} as {1 : (4, [x:=0,i:=7],p)}. (0) 


On the other hand, when x was previously set to true, the loop executes its body. 
Precisely, we have: 


{1: (while x do stm, [x:=1, i:=i], p) } 


Jyo {lila = i+1; q =H; x = meas q; while x do stm, |x:=1, i:=i],p)} (1) 
yoli : (q= H; x = meas q; while x do stm, |x:=1, i:=i + 1], p)} (2) 
Iyo {1 : (x = meas q; while x do stm, |x:=k, i:=i + 1], 4 (p))} (3) 
E {pr : (while x do stm, [x:=k, i:=7 + 1], pr)) }eego,1}; (4) 


where in the last step, the probability p;, equals tr( Mp, „8y (p)), while the normal- 
ized quantum state pp is given as Mk, ı (y (p)). The above reduction is obtained 
by applying the rules of Figure 4: rule (Wh,) for reduction (1); rules (Exp) and 
(Seq) for reduction (2); rules (Op) and (Seq) for reduction (3); and finally rules 
(Meas) and (Seq) for reduction (4). 


For an arbitrary initial quantum state p = T r) € D(Hg) (where a, 8, y, ô € 
C and tr(p) =a +ô = 1, y=8, etc.), it follows that 


l/a Sapta 1+8+7 
po = tr(Mo 1 HpH') = tr((58)3 a oe) => 
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— : M;,.1HpH' Mi / y t 
and that, pı = 1 — pọ = +0) Ein), Using pp = A P Ra — (Mis) a(My,s Ht) 


tr(M,1HpHt) ~~ Pk 2 
(e sae ee De j 
0 yO )\ 1/v20 _ (2 0) 
Po om pı 


0 0 \(aB\(01/v2 
REZAN 7 
Summarizing (1)—(4) we thus get: 


Po = 


01): 


{1 : (while x do stm, [x:=1,i:=i], e 3 yt 


4,4 {po: (while x do stm, [x:=0, i:=i + 1], po), 
pı: (while x do stm, [x:=1,i:=i+ 1], p1)}. 


Putting everything together, we have 
(Cntoss, s, é a) ——»2 {1 : (while x do stm, [x:=1, i:=0], p)} 


— {po: (while x do stm, [x:=0, i:=1], po), 
pı: (while x do stm, [x:=1, i:=1], p1)} 


mete {po: (1, [x:=0, i:=1], po); 
Ei: (while x do stm, |x:=0, i:=2], po), 
P1: (while x do stm, [x:=1, i:=2], pi)} 
HE, (po: (h fxi=0, i=], po), 
Be (WL, [x:=0, i:=2], po), 
PL: (while x do stm, [x:=0, i:=3], po), 
PL: (while x do stm, |x:=1, i:=3], 1)} 
THAT 


Q 


where terminal configurations are underlined. This reduction converges to the 
terminal distribution 


terMentoss (8) p) = {po : H [x:=0, i:=1], po) } + {3 : (l; [x:=0, i:=i + 1], po) }i>1; 
with an expected derivation length of 


5 
edlentoss (s: ( 3 É) =2+4-+ (Po + 41) + > Pt = 74 8p = 11 - 4(8 + 9). 
i=l 


For expectation f(s, p) = s(i), measuring the iteration counter i, we have 
a 
WPentoss F (5; (s4)= mx1+5 (i+1)= Po + pı + 2pı =2— (8 +7), 


that is, the mean value held by i holds after execution is 2 — (6 + y). The 
termination probability is 


AWPcentoss (Ao.1) (s, i a) = Po X 1+ 3 = Xl= Po + pi = = 1, 


i.e., the program is almost surely terminating. 
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3 Weakest Pre-expectations and Arithmetical Hierarchy 


In this section, we study the hardness of some natural quantitative problems for 
weakest pre-expectations and expected derivation length. 


Computability-Aimed Restrictions. This subsection is devoted to putting 
some restrictions on programs and on the considered notion of expectation to 
overcome the issues of computability, mentioned in the introduction. 


Algebraic numbers. Towards this end, our solution is to target a subset of com- 
plex numbers, where simple operations like equality are decidable. We consider 
the set Q of algebraic numbers, i.e., complex numbers in C that are roots of a non- 
zero polynomial in Q|X]. Let A QAR be the real closed field of real algebraic 
numbers in R. The following inclusions trivially hold (i) NC QCACRCC 
and (ii) Q C C. It was proved in [18, Proposition 2.2] that equality over Q and 
inequality over A are decidable using Cohn’s representation [12]. It is well-known 
that the product and sum over Q are computable in polynomial time. 

We now restrict the program semantics to matrices and density operators over 
algebraic numbers. Given a totally ordered set of qubits Q = {q1,..., qn}, let 


Ha be the Hausdorff pre-Hilbert space oO (i.e., the completeness requirement 
on Hilbert spaces is withdrawn) of n qubits defined by Hg = Q1 Hq with 
Ha Ê Q being the vector space of computational basis {|0) ,|1)} over the field 


Q. Let M(Hg) and D(Hg) be the set of matrices and density operators on Hg, 
respectively. 


Clifford+T gates. For the program semantics to be defined on the space D(Hg), 
the considered quantum gates are now restricted to gates whose corresponding 
unitary operators are in M(Hg), i.e., have a matrix representation over the 
algebraic numbers. To this end, we consider a restriction to the Clifford+T gates: 
I, X, Y, Z, H, S, CNOT, and T, whose unitary matrices are given below: 


SA 1 (49) awora o) I "5 ) 
moe gga)? Poet) 


The Clifford+T fragment is the set of unitary transformations generated by se- 
quential (matrix multiplication) and parallel (Kronecker product) compositions 
of the gates H, S, CNOT, and T. This constitutes a reasonable restriction for 
unitary operators as it is known to be the simplest approximately universal 
fragment of quantum mechanics [1]. 

A central observation is that the superoperator associated with a unitary 
operator of the Clifford+T fragment is an endomorphism over density operators 
in D(Hq). 
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Lemma 1. The Clifford+T fragment preserves D(Hg), i.e., there exist Q and 
FEQ such that for each unitary operator U of the Clifford+T fragment By, € 


D(Ha) > D(H). 


Notice that, while a restriction to Clifford+T is reasonable in terms of quan- 
tum mechanics and universality, our result can be extended by adding any quan- 
tum gate preserving the above lemma. For example, the phase shift gate, defined 
by Pp = (5 ee ), preserves D(Hg) whenever y = rn, for any r € Q. 

Let Stmtcr be the set of statements restricted to quantum gates computing 
Clifford+T unitary operators (hence a subset of Stmt), Ster be the set of states 
whose quantum state is in D(Ha), and Confcr be the set of well-formed config- 
urations in (Stmtcr U {|}) x Ster. Let St&" be the set of states in Ster that are 
well-formed wrt statement stm. Once again, by abuse of notation, we will denote 
this set by Ster when stm is clear from the context. 

A consequence of Lemma 1 is that Confer is closed under reduction, in 
the following sense. Let DÊX (A) C D(A) be the set of finitely supported sub- 
distributions ô with algebraic probabilities, i.e., (a) € At for all a € A. 


Lemma 2. The set De (Confer) is stable under reduction, more precisely, if 
ôE IDAR (Confor) and 5 “%q €, then £ € Dr (Confor) and c € AT. 


Computable expectations. We also restrict the expectation codomain to algebraic 
numbers. Hence the considered expectations will be functions in Ster > At. On 
its own, this restriction is not sufficient for our concerns, as the set Ster > At is 
not countable. It implies that there exist expectations in Ster > At that are not 
computable functions. To resolve this issue, we restrict the space of expectations 
further to computable ones: 


Ecr Ê {f | f : Ster > At, f computable}. 


An immediate consequence of Lemma 2 is that term,,,(0) € D(Confcr) for any 
stm € Stmtcr and o € Ster. In consequence, qwp,,,, f o is well-defined for all 
f € Ster. This justifies that in our treatment below, we restrict expectations 
to the class Ecr. However, keep in mind that despite Lemma 2, the subdistribu- 
tion termg¢,(7), obtained at the limit, does not fall within D' (A). It is neither 
finite nor are probabilities algebraic (At is not complete). In particular, in gen- 
eral qwp,,, f 7 is a real number, rather than an algebraic one. 


Quantitative Problems. We now define formally the quantitative problems 
that we study. 


Testing problems. Some natural quantitative problems related to weakest pre- 
expectations are to determine for a given program stm, a given state g, a given ex- 
pectation f, and a given algebraic number a, whether the corresponding weakest 
pre-expectation qwp,,,, f o is smaller than or equal to a. In this setting, it makes 
sense to consider any possible relation in the set {<,<,=,>,>} C P(A x A) as 
one could be interested in finding precise values, (strict) upper- or lower-bounds. 
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Definition 3. The testing problem sets TESTR C Confer X Ecr X At, for RE 
{<,<,=,>,>}, are defined by: 


(stm,o, f,a) E€ TESTR := > (qwhegin J 7) Ra. 


The consideration of both TEST< and TEST, may seem redundant, as TEST, 
can be viewed as the complement of TEST<. However, it makes perfect sense 
to distinguish both properties, when considering the corresponding universal 
problems, as we do in a moment. 


Finiteness problem. Another problem of interest consists in checking whether 
the weakest pre-expectations produces some finitary output. 


Definition 4. The finiteness problem set TESTz., C Confer x Ecr is defined 
by: 
(stm, o, f) € TEST% : 4 > qwPsin f 7 < 00. 


Termination problems. We also define two termination problems for almost sure 
termination and positive almost sure termination: 


Definition 5. The sets of (positive) almost-sure terminating configurations 
AST C Confer (PAST C Confer) are defined by: 


(stm, o) € AST :4—> |term,,,(7)| = 1 


(stm, o) € PAST :<4—> edl,.,(0) < œ. 


stm 


It is well-known that Past ¢ Ast, cf. [9]. 


Universal problems. Another kind of natural problems arises if one tries to check 
some properties for each possible program input (i.e., for each state øo). We can 
thus define universal properties for each of the sets described previously. 


Definition 6. The sets of universal testing, finiteness and (positive) a.s. termi- 
nation problems are defined by: 


(stm, f,g) E UTESTR C Stmtcr x EX, => Vo E Ster, 
(stm, f) = UTEST 466 Cc Stmtcr x Eor <> Vo E Stor, 


stm € UAST C Stmter ==> Vo € Stor, 
stm € UPAST C Stmtor = > Vo € Stor, 


stm, g, f, g(o)) € TESTR 
stm, o, f) € TEST zo, 
stm,o) € AST 

stm,o) € PAST 


( 
( 
( 
( 


Example 3. We have Cntoss € UAST and Cntoss € UPAST, for the program 
Cntoss of Figure 3. Indeed, it was shown in Example 2 that Cntoss termi- 
nates with probability 1 and a finite expected derivation length. This prop- 
erty holds for any input of the domain. In the same example, we have proven 
(Cntoss, f) € TESTz.. for f(s,p) = s(i). Indeed, we have shown the stronger 


property (Cntoss, f,g) € TEST=, where g(s, 6 5 ) =2-(6+7). 
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Standard Universal 

Problem Class Problem Class 

Testing TESTS 5o UTESTS m ® 

TEST> O9 ®  UTEST> mg ® 

TEST= 3 UTEstT_ m ® 

TEST< m ® UTEST< m ® 

TEST< xy UTEST< T9 ® 

Finiteness TEST 400 yl UTEST 400 TS 2 
Termination AST TI9 UAST 3 
Past 59 UPAST TI9 


Table 1. Completeness results for quantitative problems in the arithmetical hierarchy. 


Completeness Results in the Arithmetical Hierarchy. In what follows, 
we place the introduced quantitative problems within the arithmetical hierar- 
chy [34]. The arithmetical hierarchy is a means to classify and relate undecidable 
problems wrt. to their inherent difficulty, measured in terms of the number of 
(unbounded) quantifier alternations needed to state the problem as a formula in 
first-order arithmetic, based on a decidable (recursive) predicate. 


Reminder on the arithmetical hierarchy. Classes of the arithmetical hierarchy 
are defined inductively as follows: 


>i 
l 
dd 
lp 


REC, REC being the class of decidable problems (recursive sets) 
Tiny = {4 | I% € Xn, VE(Y(G) = V¥.O(E,9))}; 
Tri 2 {Y | 34 € Mh, VEY) => 39.9(F,9))}. 


For each n, I? is the complement of X? (i.e., H} = co-X°, and vice versa) and it 
is a well-known result that X? and JZ? correspond to the classes RE of recursively 
enumerable (i.e., semi-decidable) problems and co-RE of co-recursively enumer- 
able (i.e., co-semi-decidable) problems, respectively. Given the sets A C X and 
BC Y, we write A <m B (A is many-one reducible to B) if there exists a com- 
putable function f : X > Y such that Vx € X, x € A => f(x) € B. Given a 
class C of the arithmetical hierarchy and a set A, A is C-hard if YB € C, B <m A. 
A set A is C-complete if A € C and A is C-hard. It is well-known that if a set A 
is C-complete then its complement, noted co-A, is (co-C)-complete. 


Results. Table 1 associates the quantum decision problems to the correspond- 
ing classes in the arithmetical hierarchy for which we have proven them com- 
plete, that is, we have proven membership and hardness for the corresponding 
class. Some of the results may seem surprising. For instance, the testing problem 
TESTS, i.e., deciding qwp,,,, f o > a within the Clifford+T fragment, turns out 
to be recursive enumerable. It is thus classified identical to the (classical) halting 
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problem H.° Remarkable, through the restriction to the Clifford+T fragment, 
corresponding problems are ranked within the arithmetical hierarchy identical 
to their non-quantum counterparts (see [37,24]). This observation holds for all 
problems apart those marked with (t) which, to the best of our knowledge, have 
not been studied in a classical/probabilistic setting. 9- and IT$-completeness 
of the universal testing problems, given relations > and < respectively, has been 
conjectured by Kaminski in his PhD thesis [25] for probabilistic programs. 

A crucial observation towards these results is that, restricting to the Clif- 
ford+T fragment, the weakest pre-expectation of a program P can be approxi- 
mated through computable transformers qwps” : Ecr > Ecr that limit execution 
of stm to at most n € N reduction steps. That is, 


< 7 
qWPstm f oO S term” (o) f) 


for term} (o) the distribution of terminal configurations obtained within n re- 
duction steps, when evaluating (stm, o). With regards to the above mentioned 
TEST, € X? for instance, observe that: 

(stm, f,o,a) E TEST> 4 > quwp,,, fo >a 


<=> lim qwps” fo>a 
1—00 


<> In€EN, 36 € At \ {0}, qwps® fo>a+ô. 


Crucially, the predicate qwps” f o > a+6 becomes computable. In essence, this 
is a consequence of Lemma 2: The n-th step normal form distribution termS” (o) 
is finite and computable, as f is computable so is thus qwps” f o. From here, 
the result follows now as equality on A is decidable. The proof of this, as well 
as all completeness proofs listed in Table 1 can be found in the Appendix. The 
following constitutes our first main result. 


Theorem 1. All completeness results in Table 1 hold. 


4 Quantum Expectation Transformers 


In what follows, we are interested in deliniating subclasses of testing problems 
that lead to decidability. To this end, we now define a notion of quantum ez- 
pectation transformer as a means to compute symbolically the weakest pre- 
expectation of a program. We first introduce some preliminary notations in order 
to lighten the presentation. 


Notations. For any expression e, [e] is a shorthand notation for the function 
A(s, p).e]§ € St > Rt. We will also use f[x := e] for the expectation 
A(s, p).f(s[x := [e]*],). Similarly, for a given map x : D(Hg) > D(Hq), 


ŝIn our context the halting set H can be defined as the class of classical programs 
and states (P,o) for which P is halting on ø. 


On the Hardness of Analyzing Quantum Programs Quantitatively 47 


get[skip]{f} $ f 
qet[x = e]{f} ê fik := e] 
qet|stmı; stmə]{f}  qet|stmı ]{qet|stm2]{f}} 
get[if b then stm; else stm2]{f} ê qet[stmı]{f} +p.) qet[stm2] {f} 
qet[while b do stm]{f} £ lfp (AF.qet| stm] {F} +m f) 
qet[a += U] {A} 4 flov 
aiara fe orad e FI m 


Fig. 5. Quantum expectation transformer qet[.-]{-} 


fix] = A(s, p).f(s,x(p)). We will also sometimes group such state modifications, 
for instance, f[x := e; x] stands for (f[x := e])[x] and f[x := e,y := e’] stands 
for (f[x := e])[y := e’]. 

For p € St > [0,1] and f,g E€ St > R*™™, f +, g denotes the function 
do.p(a) - f(a) + (1 — p(o)) - g(a) € St 4 R+”, similar we use f - g to denote 
do. f (o): g(a) € St + R*™. Thus, for instance, f[x := x + 1] +,,=1) f behaves 
like f, except that x is first incremented when applied to states with classical 
variable x equal to 1. In correspondence to the normalization of quantum state 
Mx,i, We define probabilities pp; = Ap-tr(Mr ipM} ,). We overload this function 
from D(Hq) to St s.t. pk ils, p) = pk ilp). In this way, f[x := 0; mo] +po.: 
f[x := 1; m,,] computes precisely the expected value of f on the distribution 
of states obtained by measuring the i-th qubit and assigning the outcome to 
classical variable x. 

Finally, we denote by < also the pointwise extension of the order from R+°° 
to functions, that is, f < g holds iff Vo € St, f(o) < g(o). 


Definition 7 (Quantum expectation transformer). The quantum expec- 
tation transformer consists in a program semantics mapping expectations to ex- 
pectations in a continuation passing style 


get[-]{-} : Stmt — (St — Rt™) > (St — RT) 
and is defined inductively on statements in Figure 5. 


This transformer corresponds to the notion of expected value transformer of [6] on 
the Kegelspitze S = (R*™, +2), with +, being the forgetful addition. In the case 
of loops, the least fixed point Ifp is defined with respect to the pointwise ordering 
on the function space St — R*°. Equipped with this ordering, this space forms 
a w-CPO. As the quantum transformer can be shown to be w-continuous, the 
fixed-point is always defined, cf. [44]. 


Theorem 2 (Adequacy). The following holds: 


Vstm € Stmt, Vf :St > RT”, qwp,,,(f) = qget[stm]{f}. 
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continuity qet[stm]{sup; fi} = sup; qet|stm]{ fi} 
monotonicity f < g= qet[stm]{f} < qet[stm]{g} 
upper invariance (]-b]-f < g A [b]-qet[stm]{g} < g)=qet[while b do stm]{f} < g 


Fig. 6. Universal laws derivable for the quantum expectation transformer. 


Apart from continuity, the quantum expectation transformer satisfies several 
useful laws, see Figure 6. The (monotonicity) Law permits us to reason modulo 
upper-bounds: actual expectations can be always substituted by upper-bounds. 
It is in fact an immediate consequence from the (continuity) Law, which is 
defined for any w-chain (f;);. The (upper invariance) Law constitutes a general- 
ization of the notion of invariant stemming from Hoare calculus. It is used to find 
closed-form upper-bounds g to expectations f of loops. The pre-conditions state 
that g should dominate f on states where the loop would immediately exist, 
and otherwise, should remain invariant under iteration. It is worth mentioning 
that this proof rule is not only sound, but also complete, in the sense that any 
upper-bound satisfies the two constraints. The following example illustrates the 
use of this rule on the running example. 


Example 4. Following Example 2, we over-approximate qet|Cntoss]{f}, for 
f(s, e) = s(i) the post-expectation measuring the classical variable i. 

To this end, observe that the function g : St — Rt is an upper-invariant 
(Figure 6) to the while loop while x do stm, given a post-expectation f : St > 
Rt. Recall that the loop body stm comprises (i = i+1; q *= H; x = meas q). 
To fulfill the conditions of the (upper invariance) Law the following inequalities 
have to be met: 


[-x]-f<g [x] -qet[i = itl; qx=H; x = meas ql{g}<g. (5) 
By unfolding the definition, we see 


qet|i = i+1; q*=H; x = meas q|{g} 
= get[i = i+1]{qet[q *= H] {qet[x = meas q]{g}}} 
= get[i = i+1]{qet[q *= H] {g[x:=0; mo,1] +p. g[x:=1; m1,1]}} 
= get[i = i+1] {9[x:=0; moi; OH] +p.4-6y J:=1; m1; Pul} 
= g|x:=0; moi; By; ii=i+1] +po.1-8u Ge=1i M11; Ga; i:=i+1] 


=X(s,p)- X. pral®nlo)) g(s[x = k, ii=i+1], mg, (8u (P))). 
ke{0,1} 


By using the identities computed already in Example 2, we thus obtain 


qet[stm]{9} (s (3f) = Ð megli = k i=i+1] p), (6) 
keE{0,1} 
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where, as in Example 2, po = 1HY, pı = ty) | po = (oo) and pi = (69) 
We claim that g(s, 3 3) £ s(i) + s(x) -(2—(8+7)) is an upper-bound to 


the pre-expectation of the while loop wrt. to the post expectation f. To this end, 
we check (5). The first inequality is trivially satisfied. Concerning the second, 
notice that by definition, 


g(s[x = 0, i:=i+1], (489) = s(i)+1 and g(s[x = 1,i:=i+1], (89) = s(i)+3. 
By (6) we have 


get| stm} {9} (s, (35) = ea + 1) + m el 
= (si) +2)- (B+) =a(s, (35), 


from which now the second constraint follows by case analysis on the value of x. 
Hence qet| while x do stm]{f} < g and, by monotonicity (Figure 6), 


qet[Cntoss]{f} (s, (s 4) < qet[x = tt; i = 0]{g} (s, P 3) 
= g([x:=1, 4-0), (3 § )) =2- (8 +7). 


Note that, in this case, the computed bound is exact. 


One question of interest is to find the qet|-]{-} of a given statement. We 
obtain the following completeness results as a corollary of Theorem 1 and The- 
orem 2 on the Clifford+T fragment. 


Corollary 1. The following completeness results hold: 


— {(stm, f,g) € Stmtcr x EZ, | Vo, get[stm]{f} (oc) = g(o)} is H9-complete. 
— {(stm, f,g) € Stmtcr x EZ, | Vo, get[stm]{f} (o) < g(o)} is I1?-complete. 


The same kind of result can be straightforwardly obtained for each of the quan- 
titative problems defined in previous section. All the corresponding sets are un- 
decidable: they are at best (co-)semi-decidable as illustrated by Figure 1. This 
motivates us for restricting the problem a bit further to find a class of functions 
for which the quantitative problems for wp,,,, f can be decided. 


5 Decidability of qet Inference over a Real Closed Field 


Corollary 1 illustrates that it is not sufficient to relax the problem of finding the 
quantum expectation transformer of a given statement to upper-bounds, in order 
to make it decidable. The undecidability of finding the quantum expectation 
transformer of a given program is due to two other issues: 1) Issue 1: The 
computation of a fixpoint for qet[-]{-} in the case of while loops, 2) Issue 2: 
The check for inequalities over functions in Ecer, whose first-order theory is not 
decidable. This section is devoted to overcoming these two issues, by finding an 
expressive fragment on which the inference of an upper-bound of the quantum 
expectation transformer becomes decidable. 
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qinf[skip]{F} 4 F 
qinf[x = e]{F} £ F[x:=e] 
qinf|stmı; stmo]{F} £ qinf[|stmı ]{qinf[stme ]{F}} 


iff b 
bF qinf|st F< X 
qinf | then stm; | {F} = Xv, with side-cond. span [etma HA S 2 
else stm2 abr qinf[stm2|{F} < X: 
- qinf[stm]{Xe} < Xe 


b 
qinf| while" b do stm] {F} = Xv, with side-cond. 
=b H F < Xe 


qinf[q «= UJ{F} = Flv] 
po, =OF Fix :=1; mael < Xe 
pi =O Fix :=0; moj] < Xe 
Pri FOF Fix := 0; mo,:] 

t+p04 FE := 1; m1,i] < Xe 


qinf|x = meas“ q: | {F} = Xv, with side-cond. 


Fig. 7. Term representations of qinf|-]{-} and their corresponding side-conditions. 


Symbolic Inference. As a first step towards automated inference, we define 
a symbolic variant of the quantum expectation transformer in Figure 7. In the 
case of conditionals, loops, and measurements, we will use fresh variables for 
expectations; side conditions will guarantee that these variables indeed denote 
(upper-bounds to) the corresponding expectations. This means that the sym- 
bolic version yields correct results only when the expectations assigned to these 
variables satisfy all the side conditions. By solving the generated constraints, 
viz., by finding an interpretation of ascribed variables that satisfy the imposed 
side-conditions, we effectively arrive at an inference procedure overcoming Issue 
1. 

To formalize this approach, we associate a unique label £ with each loop, con- 
ditional, and measurement, occurring in the considered program. Notationally, 
we write while’ b do stm / iff b then stm, else stm / meas‘ q. Such labels 
permit us to associate a unique expectation variable X; to each of these con- 
structs. Given a set of such expectation variables EVar, the set of terms ETerm, 
upon which the symbolic quantum expectation transformer operates, is defined 
according to the following grammar: 


ETerm F,G ::= X | F|x := e] | F[x] | F +p G, 


where X stand for an arbitrary expectation variable in EVar. As stressed above, 
X will be used to denote certain expectations wrt. loops, conditionals, and mea- 
surements. We have already introduced the notations F|x := e] and F[x] to 
represent updates to the classical and quantum state, respectively. Here, x will 
always denote a finite composition of superoperators gy and measurements mMk ,i- 
By ensuring that normalization of quantum states m,;(~) is never considered 
in the degenerate case of a zero-probability measurement pp (p), it will thereby 
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always be possible to write x as Ap. eM for some M € M(Řo)i in the Clif- 
ford+T fragment. Finally, following the same reasoning, in the barycentric sum 


F +, G the probability p is a function in the quantum state, and will always 


be of general form Ap. aA for some M,N € M(Ha) in the Clifford+T 
fragment. Similar to before, we may group updates such as in F[x := e; x]. 


The symbolic variation of the expectation transformer can now be defined as 
qinf|-]{-} : Stmt — ETerm > ETerm, 


generating also a set of side-conditions of the shape IT F F < G, with the 
intended meaning that G binds F on all input states that satisfy the predicate 
I’. The full definition of qinfer is given in Figure 7. As already hinted, the side 
conditions ensure that introduced variables X; indeed yield an upper-bound on 
the corresponding expectation, in the case of conditionals by case-analysis, and 
in the case of loops via an application of the upper-invariant law from Figure 6. 
In the case of measurements, Mmg, and p,,; are defined exactly as before. Here, 
we single out the two cases where the probability of a measurement, either 
po lp) = tr(Mo,ip) = tr(Mo,ipMj,,) or pii(p) = 1— po,i(p), is zero. This way, we 
avoid the case analysis underlying the definition of mg, and may, wlog., assume 


Mp ipM} , 
that it is indeed of the form Ap. -kri 


amma e ‘ £ . T 
Teao d" with non-zero trace tr(Mk ipM i) 


Example 5. In correspondence to Example 4, let us consider the application of 

the inference procedure on the program Cntoss, wrt. to the post-expectation 

f(s,p) = s(i). We label the loop and measurement with m and w, respectively. 
Let X denote the post-expectation f. Unfolding the definition, we see 


qinf|Cntoss]{X} = qinf|x = tt; i = 0; while” x do stm]{X} 
= Xy[x:=1; i:=0], 


generating the side-conditions x F X,,[@7;i:=i+1] < Xw and ax X < Xw. 
The left-hand side of the first constraint is obtained from 


qinf|stm]{Xw} = qinf| i = i+1]{qinf[q*= H ]{qinf|meas™ q]{Xw}}} 
= Xml; i:=i+1]. 


Note that this expansion generates further constraints, this time on Xm repre- 
senting the measurement. Specifically, it yields the following constraints: 


ie = OF A a l < A (for k € {0,1}), 
Po,1 2 0 # P1,1 z Aee mo,1] +po,ı Xo SIs ona] Š A . 


Using the analysis from Example 4, we interpret Xw and Xm as: 
a(Xw) £ Als, (36) s6) + s2- (8 +9), 
a(x.) 2 A(s, (3 5). s(i) +2 — 2a. 
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Furthermore, we interpret the input variable X as f, i.e., a(X) = A(s, p). s(i). 
Notice how a(Xw) just corresponds to the upper-invariant g derived in Exam- 
ple 4. Using the assignment, it is now standard to check that it is a solution to 
the five constraints. For instance, considering states o = ({i:=n,x:=c}, e a 
the ultimate constraint amount to the implication 


a #0 #8=> n+ (n+ 2) <n+2- 2a, 
which trivially holds. Finally, recall qginf[Cntoss]{X} = Xy[x:=1; i:=0]. This 


term is interpreted as A(s, e A ). 2— (8 +7), yielding the optimal bound com- 


puted in Example 4. 


Example 6. Re-consider program RUS depicted in Figure 1. Here, we are inter- 
ested in an upper-bound on the number of T-gates, counted by the program 
variable i. As before, we label the loop and measurement with m and w, respec- 
tively. Let 

stmo 


stm =q2—=|0); ...; x= meas” qp, 


be the body of the while loop statement (see Figure 1). We proceed with the 
analysis backwards. By the rules of Figure 7 it holds that qinf[stmo ]{F} = 
F|; i:=i+2] for any F, where ® gives the quantum state updates within 
stmp. Unfolding definitions, we have qinf[RUS|{X} = Xy[x:=0; i:=1] with 
x F Xm[®; i:=i+2] < Xw and =x F X < Xw, since, by the above observation, 


qinf[stm]{X,,} = qinf[stmo |{qinf[x = meas™ qo|{Xw}} = Xm[@; i:=i+2], 
subject to the following additional constraints stemming from measurements: 


Dil hie = 0 IF XG eh Mr,2| = A, (for k € {0, 1}), 
p0,2 #0 F ma F Xy[x:=0; mo,2] aa Aaka na < Xm- 


Taking a(X) = X(s,p). s(i) and solving the constraints yields a constant 
upper bound of 8/3 on the expected number of T-gates used by the program. This 
is due to the fact that the probability of the internal measurement is always 3, 
Note that this bound is tight. 


The transformer qinfer can be linked to qet of course only when variables Xọ 
are interpreted in a way that the side conditions generated by infer are met. To 
spell this out formally, let a : EVar > Ecr be an assignment of expectations to 
variables in EVar, and let |F]®“ : Ecr denote the interpretation of F € ETerm 
under a defined in the natural way, e.g., [Xe]® = a(Xe), [F[yJ]* = [FI° NI, 
etc. 

We say that a constraint [+ F < G is valid under a if [F]*(c) < [G]*(c) 
holds for all states o € Ster with ['(o). An assignment a is a solution to a set 
of constraints C, if it makes every constraint in C valid. Finally, we say a is a 
solution to qinf[stm]{f} if it is a solution to the set of constraints generated 
by qinf[stm]{/}. We have the following correspondence: 
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Theorem 3. For any a € EVar > Ecr, if a is solution to qinf|stm|{F} = G, 
then it holds that qget[stm]{]F]°} < [G]°. 


It is worth mentioning that the above procedure could have been defined 
without restriction to the full space St + Rt of expectations. In this case, this 
symbolic approach is also complete, in the sense that if qet[stm]{f} = g then 
qinf|stm]{X} = G for some G such that the side-conditions have a solution a, 
with a(X) = f and [G]® = g. As our main focus is on decidability, however, we 
have made the choice to restrict ourself to the Clifford+T setting. 


Restriction to Polynomials over the Real Closed Field A. We now turn 
our eyes towards constraint solving, addressing the remaining Issue 2 through 
restricting the domain of expectations to polynomials over algebraic numbers. 
To be more precise, we consider the following problem. 


Definition 8. Let E C Ecr be a class of expectations. The inference problem 
QINFER(E) C Stmtcr x E x (EVar + E) is given by 


(stm, f,a) E QINFER(E) = > alX := f] is solution to qinf[stm]{X} 


In the above definition, (stm, f,a) E€ QINFER(£) is satisfied if the statement stm 
has solution a[X := f] wrt. the expectation f. Hence it can be seen as checking 
whether f is a post-expectation for stm. In particular, any solution a[X := f] 
constitutes an upper bound on the weakest pre-expectation of f (see Theorem 3). 
We will now see that QINFER(£) is decidable, for E the set of (real algebraic) 
polynomial expectations of (arbitrary but fixed) degree d. For states Ster over 
n classical variables y,,...,yn and m qubits, let A@[Stcr] denotes the class of 
functions of polynomial expectations of the form 


AY: = Yi Ji<i<n, (Aje +iBjk)i<j ksm). P, (7) 


where variables Y; refer to the classical, and variables A; and Bj x refer to the 
real part and imaginary part, respectively, of each algebraic coefficient in the 
quantum state. Further, P € A[Y1,... , Yn, A1,1;.- - , A2m 2m, Bii,..., B2m 2m] is 
a multivariate polynomial with coefficients in A. The index d refers to the (total) 
degree of the underlying polynomial P. For instance, 


M{x:=X;i:= J}, Ca oe I+ X(2— (41,2 + A21)) € A?’ [Ster] 


One important remark here is that we allow for possibly negative polynomials 
whereas expectations only output positive real algebraic numbers. Consequently, 
some side conditions are put on the admissible coefficients Aj, and Bj, of the 
input density matrix to preserve this condition (the matrix is positive, has trace 
1, is hermitian). For example, sae Aii = 1, PA Bi; = 0 (trace is 1) and 
Vi, k, Aik = Ax i and Bi, = —Br,« (self-adjointness). One can easily check 
that the expectations defined in Example 5 are in A“[Stcr], for d > 1. 


54 M. Avanzini, G. Moser, R. Péchoux, S. Perdrix 


The restriction to polynomials is made on purpose, as quantifier elimination 
is decidable in the theory of real closed fields, a well known result due to Tarski 
and Seidenberg. Recall that the theory of real closed fields is the first-order 
theory in which the primitive operations are multiplication, addition, the order 
relation <, and the constants 0 and 1. Consequently, the only numbers that 
can be defined are the real algebraic numbers. Specifically, we will make use 
of the following result, quantifying the complexity of the quantifier elimination 
decision procedure as a function exponential in number of variables, and double- 
exponential in the number of quantifier alternations. 


Proposition 1 ([21, Theorem 6]). Let A be an integral ring over a real 
closed field R. Let y = Q1%1.Q2%2.---Qiz. d be a formula in prenex-normal 
form, where Vk, Qk € {V, 5}, Qk A Qkr+1, and ġ is a quantifier-free formula 
over i variables and j atomic propositions of the shape P > 0, each P being a 


polynomial of degree at most d with coefficients in A. There exists an algorithm 
¿00 


computing a quantifier-free formula equivalent to in time O(|w|) - (jd) 


As A constitutes both an integral ring and a real closed field, the above 
theorem is in particular applicable taking A = R = A. In the particular case 
where w is a closed formula, the resulting quantifier-free formula is simply a 
Boolean combination of inequalities over constants from A. Since we already 
observed that these can be decided in polynomial time, the above proposition 
thus implies that validity of w is decidable under the given time bound. 

By restricting assignment a to polynomial expectations, it becomes decidable 
to check that a is a solution to a given constraint set C. Indeed, under such a 
polynomial assignment a, a constraint I’ H F < G becomes expressible as a 
formula in the theory of real closed field A. By letting a range over polynomial 
expectations with undetermined coefficients, we can this way arrive at the main 
decidability result of this section. 


Theorem 4. For any degree d € N, d > 1, the problem QINFER(A®@[Ster]) is 
i 7 
decidable in time 2? a where n is the size of the considered program. 


Practical Algorithm. Theorem 4 established a computable algorithm on the 
inference of upper bounds on weakest pre-expectation on quantitative program 
properties of any given mixed classical-quantum program. Nevertheless, the com- 
plexity of this algorithm — double-exponential in the program size — is forbid- 
dingly high. In order to turn this procedure into a practical algorithm, we have 
to tame this inherent complexity. For this, significant further restrictions on the 
class of bounding functions are necessary. We propose to proceed as follows. 
(1) Bounding functions: in (7) we restricted the class of expectations to poly- 
nomials, which in turn yield a bound on the weakest pre-expectation. Based 
on an analysis of concrete examples considered in the literature (e.g., [30,6]), 
this can be tightened further to degree 2 polynomials. (2) Approximate solu- 
tions: Theorem 4 rests upon (the decidability) of quantifier elimination. Thus 
the constraints C induced through the symbolic inference of qinf|stm]{X} = G 
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(G, X € ETerm) are solved exactly. Over-approximation, however, suffices, if we 
are only interested in soundness of the inference mechanism. 

The restriction of the class of bounding functions is in essence a question of 
applicability of the automation, taking into account particular use-cases. With 
respect to approximate solutions, we observe that the actual constraints C con- 
sidered have at most one quantifier alternation and admit a quantifier prenex of 
the form 4*V*, that is, a sequence of existential quantifier follows by a sequence 
of universal quantifiers. Roughly speaking the existential quantifiers refer to the 
inference of coefficients in the bounding polynomials, while the universal quanti- 
fiers refer to program variables. It is well-known that universal quantification in 
optimization problems can be turned into existential quantification, like Farka’s 
lemma or generalizations thereof, cf. [38,19]. (E.g., [7,29] for instances of this 
approach for the inference of expected program costs.) 

Summarizing, the inference mechanism detailed in Section 5 can be over- 
approximated to generate purely existential constraints. The latter can be effec- 
tively solved via SMT. We expect that (full) automation of the inference mech- 
anism can capitalize on these ideas. Working out the details and in particular 
implementation of an effective prototype is subject to future work. 


6 Conclusion and Future Work 


We have studied the complexity and inference of techniques for obtaining quali- 
tative program properties. One particular property of interest would be the cost 
of quantum programs, that is average time, average number of gates, mean value 
of a variable, etc. We show that these problems were undecidable in general by 
placing them in the arithmetic hierarchy and saw that inference could become 
decidable on a restricted fragment: quantum gates in Clifford+T and a function 
space with a decidable theory (polynomials of bounded degree over a real closed 
field). Further, we sketch how the latter can be transformed into an efficient 
synthesis method. 

Many open questions remain. The studied notion of expectation transformer 
describes local properties of the quantum state, while it would be interesting 
to extend this technique to the global state so as to study a mixed state in 
a quantum-only setting (without classical variables and stores). Another ques- 
tion of interest is to what extent a characterization of the quantum class ZBQP, 
the class of problems computed by quantum programs in polynomial expected 
runtime, could be obtained using this tool. 
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Abstract. Invertible programming languages specify transformations to 
be run in two directions, such as compression/decompression or encryp- 
tion/decryption. Two key concepts in invertible programming languages 
are partial invertibility and local invertibility. Partial invertibility lets 
invertible code be parameterized by the results of non-invertible code, 
whereas local invertibility requires all code to be invertible. The former 
allows for more flexible programming, while the latter has connections 
to domains such as low-energy computing and quantum computing. We 
find that existing approaches lack a satisfying treatment of partial in- 
vertibility, leaving the connection to local invertibility unclear. 

In this paper, we identify four core constructs for partially invertible pro- 
gramming, and show how to give them a locally invertible interpretation. 
We show the expressiveness of the constructs by designing the functional 
invertible language KALPIS, and show how to give them a locally invert- 
ible semantics using the novel arrow combinator language RRARR—the 
key idea is viewing partial invertibility as an invertible effect. By for- 
malizing the two systems and giving KALPIS semantics by translation 
to RRARR, we reconcile partial and local invertibility, solving an open 
problem in the field. All formal developments are mechanized in Agda. 


Keywords: Reversible computation - Arrows - Partial invertibility - 
Domain-specific languages. 


Introduction 


An invertible computation can be run in two ways: forward in the conventional 
way, or backward to recover an input given the output. Such processes appear 
frequently and prominently in a variety of contexts, enabling the shape of in- 
formation to be adapted to different purposes, while preserving the essential 
content. For instance, (lossless) compression shrinks the size of a piece of infor- 
mation to facilitate efficient storage, encryption transforms it to be inaccessible 
to third parties, and serialization reshapes it to enable storage or transmission. 
The property of invertibility is crucial, as it guarantees that the data can always 
be refit to its original purpose. 


© The Author(s) 2024 
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For example, consider the function autokey below, which computes a variant 
of the Autokey cipher (see e.g., (50). The cipher takes a primer character k, and 
interprets it as an integer (e.g., 7A? > 0,’B’? > 1,...,?Z? + 25) determining a 
shift to apply to the first element of the input. Each consecutive character in the 
input is similarly shifted by the amount given by its predecessor. For instance, 
autokey °F? "HELLO" = "CXHAD", as °F? represents a (cyclic left) shift of 5 
characters, mapping ’H’ to ’C’, and ’H’ a shift of 7 characters, mapping ’E’ 
to ’X’, and so on. 


autokey :: Char — [Char] — [Char] autokey’ :: Char — [Char] — [Char] 
autokey k ||] =|] autokey’ k [] = |] 
autokey k (h : t) = autokey’ k (h' : t!) = 

shift (chrToInt k) h : autokey ht let h = shift (—(chrToInt k)) K 


in h: autokey’ ht’ 


The corresponding decryption function autokey’ is given to the right, and shifts 
backward to restore the original input. We assume shift : Int —> Char —> Char 
performing the cyclic shift is previously defined. This is a simple example, but 
it serves as a toy model of more advanced encryption schemes and has a few 
interesting features which we highlight momentarily. 

In traditional unidirectional languages, each direction of an invertible algo- 
rithm has to be specified separately in this way, and there is no easy way of 
ensuring that the two programs really constitute each other’s inverses. Further- 
more, there is a maintenance concern—when one direction is updated, the other 
has to be updated accordingly. An alternative, more scalable approach is to let a 
single program denote both directions at the same time—intuitively, the inverse 
is derived by “reading the original code right-to-left”. Invertible programming 
languages implement this approach, letting each program be executed in either 
of two directions, which are guaranteed to form a pair of ale er Sear 
on of invertible languages include Janus [35/53), R , Inv [43] HI |1 
RFun [54], Theseus , CoreFun [25] and SPARCL a 

These languages traditionally az each individual step of computation to 
be invertible, which can be ensured, e.g., by providing a set of invertible com- 
binators as basic building blocks, or by imposing various syntactic restrictions. 
This form of local invertibility has several benefits, in addition to being a simple 
foundation for building programming languages. For example, it was observed 
early on that discarding information fundamentally results in heat dissipation, 
meaning that a machine executing only invertible instructions could in principle 
operate at lower energy levels than a conventional computer |32|. Moreover, lo- 
cally invertible languages serve as a foundation when considering other domains 
with similar requirements, such as quantum computing, where computations are 
composed of individually invertible quantum gates along with irreversible mea- 
surements (22)[48}. Despite these benefits, the local flavor of invertibility severely 
limits the flexibility of the programmer. In particular, our example function 
autokey is not actually invertible up front! The case autokey k |] = [] discards 
the value of k, which means we cannot simply read the definition right-to-left. Of 
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course, the primer k is not intended to be treated as part of the invertible input 
to autokey, but rather as a parameter determining the bijection between input 
and output strings. However, this cannot be naturally expressed in a language 
adhering strictly to the (locally) invertible paradigm, where the parameter would 
need to be preserved in the result. 

The property of becoming invertible when some parameters are fixed is known 


as partial invertibility (39][40|/44][47], and many previous languages offer some 
form of support for partially invertible definitions. However, the level of support 


varies from more limited (e.g., (25][27][35]) to more complete (e.g., [89][40]), and 
the previous work largely lacks a systematic treatment. The case of autokey is 
especially tricky, since its invertible input h flows to the unidirectional parameter 
k in the recursive call. To our knowledge, only SPARCL handles cases like 
this in a systematic way, but it does so through an advanced language foundation 
quite different from that of traditional invertible languages, and its connection 
to the locally invertible paradigm is not well-understood. Thus, it is an open 
question whether it is possible to support fully expressive partial invertibility 
while maintaining a compositional locally invertible interpretation. 

It is theoretically known that any (partially) invertible computation can be 
simulated in a locally invertible system [5]; however, this simulation gives poor 
control over the invertible behavior and is inefficient in both time and space. 
There has been research on inversion of arbitrary programs (e.g., (41]/44|/49]), 
and on logic languages with no fixed direction of execution, like Prolog and 
Curry, which use (lazy) generate-and-test to find inputs corresponding to a given 
output (4). Yet, these approaches lack the guarantee of invertibility, which is the 
main motivation of an invertible language. 


1.1 Contributions and Organization 


In this paper, we identify a core set of constructs for partially invertible pro- 
gramming, and explain them in terms of a locally invertible semantics. These 
constructs are sufficient to allow expressive partially-invertible and higher-order 
computation, thus solving an open problem in the invertible programming lit- 
erature. The constructs include (1) partially invertible branching, (2) pinning 
invertible inputs, (3) partially invertible composition, and (4) abstraction and 
application of invertible computations. 

We demonstrate the above findings by designing and formalizing two sys- 
tems based on these constructs, Katpiq4] and RRARR. KALPIS is a typed func- 
tional programming language accommodating expressive partially-invertible and 
higher-order computation, and RRARR is an arrow combinator language intended 
to capture the essence of partially invertible programs. KALPIS is given seman- 
tics via RRARR, which captures partial invertibility as an effect on top of ‘pure’ 
invertible computations, intuitively adjoining a parameter to an invertible func- 
tion, analogously to the reader monad in unidirectional computation. By in- 
terpreting terms of KALPIS as parameterized bijections, we are able to give a 


4 The name stands for “KaLpis—an Arrow-based Locally and Partially Invertible 
System”. 
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translation into RRARR combinators, giving a compositional embedding into a 
locally invertible setting. Thus, we present a simple and rigorous take on partial 
invertibility which bridges the gap between previous work in the field. 

The core constructs for partial invertibility that we present are not new per 
se, and the features of KALPIS largely coincide with those of SPARCL [39}/40). 
However, the goal of this paper is not to present KALPIS as such, but rather to 
describe partial invertibility from first principles and give a simpler semantics 
which is compatible with local invertibility. There are key technical differences 
between the two languages, and the fact that they are still similar should be 
taken as a sign that we have achieved our goal without a significant loss of 
expressiveness. 

In summary, our main contributions are: 


— We identify a core set of partially invertible programming constructs (Sec- 
tion B), which we demonstrate to be sufficient to achieve a level of expres- 
siveness similar to the state-of-the-art. 

— We showcase the constructs through the design of the invertible functional 
language KALPIS, including a formal type system and operational semantics 
(Section |3). 

— We present RRARR, an extension of the irreversibility effect and the 
reversible reader (Section |4) as a core calculus for partially invertible 
computation with a locally invertible interpretation. 

— We give a compositional translation from KALPIS into RRARR (Section J}. 

— We prove type safety and invertibility properties (Section B), and prove the 
correctness of the arrow translation (Sections Bland 5}. 

— Our developments come with a formalization in Agda including proofs of all 
theorems) and a prototype implementation of KALPIS 


Section [6] discusses the results in relation to previous work, and Section [7] con- 
cludes. 


2 Constructs for Partially Invertible Programming 


In this section, we introduce a set of core constructs for partially invertible 
programming and explain their intuitive idea using programming examples in our 
partially invertible language KALPIS, which we introduce formally in Section 
The constructs include (1) partially invertible branching, (2) pinning invertible 
inputs, (3) partially invertible composition, and (4) abstraction and application 
of invertible computations. We explain them each in turn, and show how they 
can be understood as operations on parameterized bijections, which we exploit 
in later sections to embed them into a locally invertible setting. 

These constructs act as a form of glue, allowing invertible and unidirectional 
computations to be run in tandem. Thus, we also assume some traditional invert- 


https: 
ttps: 


git.sr.ht/~ aathn/kalpis-agda 
git.sr.ht/~ aathn pis 


Reconciling Partial and Local Invertibility 63 


ible constructs taken from the existing literature, like invertible pattern match- 
ing, which we briefly explain where necessary. 


2.1 Partially Invertible Branching 


As a first example, we define partially invertible addition. In particular, the 
function x + æ +n has inverse x + x — n for any n € N. KALPIS supports 
recursive type definitions, and we can define the naturals as follows. 


data Nat = Z| S Nat 


Now, addition is implemented naturally by the following function add, taking 
an n to produce the corresponding bijection. 


sig add: Nat > Nat + Nat 
def? add n x = 
case n of 
Z >r 
S n > S (add nox) 


The language uses a functional syntax, and features elements typical to invertible 
programming: a bijection type A + B, bijection definition def*, and bijection 
application f © x. The functional types associate to the right, so the type of 


add : Nat > Nat & Nat 


indicates a partially invertible function taking a Nat to produce a bijection 
Nat + Nat. The case form showcases our first core construct, partially invertible 
branching. If n is zero, x is returned unchanged, and otherwise S is applied to 
the result of a recursive computation. The resulting function appends n copies 
of S to x in the forward direction, or peels them off in the backward direction. 

What is interesting is that case results in a loss of information: without 
prior knowledge of n, it is impossible to determine which branch to choose when 
executing backwards. This corresponds to the fact that one cannot uniquely 
determine n and x given y = n+ x. However, when n is fixed beforehand, we 
can refer to its value regardless of executing forwards or backwards, which is 
what motivates the case construct. For example, we get the following results 
when applying add to some example inputs, where the primitive operator (-)! : 
(A & B) > (B & A) lets us compute the inverse. 


--14+2=3 --3-2=1 
> add (S (S Z))oS Z > (add (S (S Z)))t oS (S (S Z)) 
S (S (S Z)) SZ 


As the type Nat + Nat requires, the argument x in the definition of add must 
be treated linearly, i.e., must be used exactly once in any successful evaluation 
(see e.g., [51] in order to ensure invertibility. For instance, changing the first 
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case above to Z — Z gives an error, as x is unused in the case body. Indeed, if x is 
never used, there is no way to recover its value in the backward direction. While 
allowing more than one use does not directly prevent invertibility, it requires 
implicit copying of values, which may induce unintended runtime failures in the 
backward execution. Similarly, we cannot branch on x using case for the reasons 
mentioned above; instead, an invertible case® form is available, explained later. 

Note that add is not a total function: e.g., the application (add (S Z))'Z will 
try to peel an S when there is none, resulting in a runtime error[]T he guarantee 
given by KALPIS is that whenever evaluating a bijection f on argument v gives 
v’ in the forward direction, then evaluating f on v’ gives v in the backward 
direction, and vice versa (this is made formal in Section B}. 

Mathematically, add represents a parameterized bijection, a family of (partial) 
one-to-one mappings fn : N > N (such that falx) = x +n). This view will 
underpin our explanation of partially invertible computations in later sections, 
and each of the core constructs in this section can also be understood from this 
viewpoint. Seen from this perspective, the case construct allows definitions of 
the form 
ifn=0 
otherwise 


7 


_ J Gn(2) 
Fula) = {ay 
where g and h are also parameterized bijections. 


2.2 Pinning Invertible Inputs 


As a second example, we consider a program fib computing pairs of Fibonacci 
numbers (defined by the equations Fy = Fy = 1 and Fyy1 = Fn + Fy-1 for 
n > 0), a classic in the invertible programming literature (e.9., (18][53)). We 
can compute fib n by case distinction on n; if n = 0, we return (fo, Fi), and 
otherwise we recursively obtain fib (n—1) = (Fy-1, Fn), with which we compute 
the next pair (Fn, Fn + Fn-1). 

However, if we try to implement this algorithm invertibly using the function 
add above, we encounter an issue: we cannot make the call add F,,oF,_1, as add 
does not treat its first argument invertibly. Since F, comes from the invertible 
input n, we need an operation that is properly invertible in both inputs. To this 
end, we can define an invertible addition add’ such that add'o(x, y) = (x, x+y). 
By preserving a copy of x in the output, the same z can be used to recover y 
by subtraction in the inverse direction. Indeed, add’ o (Fn, Fn—1) gives just the 
result we need. In KALPIS, add’ can be derived from add automatically using 
our second core construct, pin. 


sig add’ : (Nat, Nat) + (Nat, Nat) 
def* add’ (x,y) = pin add © (x,y) 


Here, the operator pin : (c > a & b) > (ca) © (c,6) lifts a partially 
invertible function to operate on invertible data; we refer to this as pinning 


7 The loss of totality is unavoidable in order to achieve r-Turing completeness f], ie., 
the ability to define all computable bijections. 
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the invertible input z, allowing it to be used in a unidirectional position. This 
construct (inherited from SPARCL (39]/40}) is crucial in practical programming, 
as it lets unidirectional computations depend on invertible data in a controlled 
manner. With add’ defined, fib can be written as follows. 


sig fib: Nat < (Nat, Nat) sig is11 : Nat > Bool 
def? fib n = def is11 n= 
case® n of case n of 
Z + (SZ,SZ) with isll (S Z,S Z) — True 
Sn let® (y, x) = fibon in E — False 
add’ © (x,y) 


with noto is11 


This example is defined by invertible pattern matching (case*), a construct in- 
herited from previous languages like Janus and W-Lisp [7]. When branch- 
ing on the input to a bijection (as opposed to a fixed parameter), postconditions 
marked by the keyword with ensure that the execution can determine which 
branch to take in the backward direction. Each postcondition is a boolean func- 
tion that must return True for any result of its branch and False for any result 
of the branches below it (this is checked at runtime following the symmetric 
first-match policy [54]). The backward evaluation tests each condition in turn, 
selecting the first branch whose condition is true. Here, is11 is used to distinguish 
the base case where the output is (S Z,S Z). 


The inverse behavior of fib computes n given a pair (Fn, Fn+1). Specifically, 
by computing Fn+1 — Fn, we obtain Fn—1, and repeating the process until we 
reach the start of the sequence lets us deduce the index of the initial pair. KALPIS 
runs fib as below. 


=> (F3, F4) = (3,5) Ea (Fa, Fr4i) = (3,5) => n=3 
> fiboS (S (S Z)) > fib? o (S (S (S Z)),S (S (S (S (SZ))))) 
(S (S (S Z)),S (S (S (S (S Z))))) S (S (S Z)) 


Again, fib is non-total: running it backwards on a pair not constituting two 
consecutive Fibonacci numbers will cause the computation to fail. 


Viewed as an operation on parameterized bijections, pin lets part of an in- 
vertible input be shifted to the parameter position if a copy is returned in the 
end. Formally, we have pin(f)n(, y) = (z, f(n,z)(y)); in our example, fin x) cor- 
responds to addition by gx, ignoring a trivial n representing variables captured 
in the pin form. 
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2.3 Partially Invertible Composition 


We now return to the example of the introduction, autokey. It can be defined in 
KALPIS as follows: 


sig autokey : Char — [Char] + [Char] 
def? autokey k xs = 
case® zs of 
U >I 
(h:t) > let? (h,r) = pin autokey o (h, t) in 
(shift (chrToInt k) oh) :r 


The structure is very similar to the unidirectional version in Section |1| but uses 
the invertible branching and pinning constructs explained previously. We assume 
primitives chrToInt : Char > Int and shift : Int — Char © Char for computing 
and performing the cyclical shifts, respectively. We omit the with-conditions of 
the invertible match by convention, as the syntactically distinct branch bodies 
can act as patterns to guide backward branching. 

This example features our third core construct, partially invertible compo- 
sition. This simply refers to the fact that we can modify the parameter of a 
bijection unidirectionally, as in shift (chrToInt k) oh. In this case, the (irre- 
versible) function chrToInt is applied to k inside the (invertible) call to shift. 
In other words, the parameter part of an invertible computation is allowed to 
depend freely on unidirectional computations, greatly enhancing the flexibility 
when programming. The reason we call it composition is because from the per- 
spective of parameterized bijections, this corresponds to the composition of a 
parameterized bijection f with an (arbitrary) function g on the parameter part, 
ie., (f © g)n(@) = fg(n) (£). In our example, we have f corresponding to shift 
and g corresponding to chrToInt. 

The example also further highlights the utility of pin. As noted in the intro- 
duction, autokey is tricky to express since each character in the invertible output 
depends unidirectionally on the preceding character in the corresponding input. 
Similar patterns also appear in more advanced examples; for instance, consider 
an adaptive compression method where each character in the input must be 
treated invertibly, and yet also be used as part of the (unidirectional) compres- 
sion table. pin enables this sort of dependency in a safe way, letting us use h in 
the recursive call to autokey and returning a copy to use in the output. 

Again, KALPIS lets us execute autokey in either direction, and guarantees 
that the two are inverses. 


> autokey ’F? © "HELLO" > (autokey ?F’)! o "CXHAD" 
"CXHAD" "HELLO" 
2.4 Abstraction and Application of Invertible Computations 


Our final core construct of partially invertible programming is the ability to ab- 
stract and apply invertible computations. Although the examples we have seen so 
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far have defined (partially) invertible computations using the def* keyword in a 
style close to traditional invertible languages, KALPIS actually features bijections 
as first-class values and supports proper higher-order programming. Bijections 
can be constructed with an invertible \-form A°®x.e analogous to that typical for 
ordinary functions, and the form def? f xı £2 ... £n = e is simply syntac- 
tic sugar for f = Av1.Arv2q...A° Xp. e. To our knowledge, only SPARCL 
shares this feature, with most invertible languages being limited to first-order 
computation. 

For example, we are able to define multiple variants of the typical map func- 
tion for lists in KALPIS. 


sig map: (a —> b) > [a] > [b] sig mapBij : (ab) > [a] © [b] 
def map fl= def? mapBij f l = 
case xs of case® xs of 
0 >i 1 0] 
h:t>fh:mapft h:t > (f oh): (mapBiy fot) 


Here, map is defined as usual, and maps a function over each element of a list, 
while mapBij makes use of the language’s invertible constructs, taking a bijection 
argument to produce a bijection on lists. For example, using mapBij, the Caesar 
cipher (which shifts each character in the input a fixed number of steps) can be 
defined with a one-liner, as below to the left. 


sig caesar : Char — [Char] + [Char] sig vig : [Char] — [Char] < [Char] 
def caesar k = mapBij (shift k) def vig ks = apBij (map shift ks) 


The function on the right, vig (from Vigenére), takes a list of keys, shifting 
each character in the input using the corresponding key—the definition relies on 
apBij : |a + b] > [a] © [b] to apply a list of bijections pointwise to a list of 
inputs (assuming the two have equal lengths). The latter example demonstrates 
that bijections can even occur inside data structures such as lists. 

Some restrictions must be observed when dealing with higher-order computa- 
tion in KALPIS. The language distinguishes between unidirectional and invertible 
terms, and carefully controls the interaction between the two. The restrictions 
mean that the invertible fragment of the language is essentially first-order; a 
formal account is given in Section 

Viewed from the perspective of parameterized bijections, abstraction corre- 
sponds to forming the function n > fn, witnessing that each choice of parameter 
n induces a bijection f,, which can be treated as a standalone value. On the other 
hand, application of a bijection a corresponds to forming the parameterized bi- 
jection app,(x) = a(x), where the parameter determining the bijection is a 
itself. 

This concludes Section |2} for more programming examples in KALPIS, we 
refer to the prototype implementation || which contains a number of nontriv- 
ial programs, including implementations of Huffman coding and sliding-window 
compression. 


8 https: //git.sr.ht /~aathn/kalpis 
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3 The KALPIS Core System 


In this section, we formally define the KALPIS core system and state the essential 
metatheoretic properties. A salient feature of the system is the clear separation 
between unidirectional and invertible terms: we have two main syntactic cate- 
gories, two typing relations, and three evaluation relations (one for unidirectional 
terms, and one in each direction for invertible terms). The unidirectional terms 
are a conservative extension of a standard simply-typed call-by-value -calculus, 
and the invertible terms add support for (partially) invertible computation. 

After introducing the syntax and reviewing some examples, Sections 
and give a formal semantics which suggests an interpretation of KALPIS 
terms as parameterized bijections. This view is made precise in Sections Bland 5} 
which define a translation from KALPIS into the arrow language RRARR, enabling 
a locally invertible interpretation. 


3.1 Syntax 


The syntax of KALPIS core is given below, where u denotes unidirectional terms, 
r denotes invertible terms, and p denotes patterns. The vector notation ¢ denotes 
an ordered sequence of elements t;, whose length we will refer to by |é|. 


u |= x | Az.u | uy ug | A®z.r | u1 © uo | Cu | case uo of {p> u} 
ri=aluor|ulor|pinuor 

| C7 | case u of {pT} | case? ro of {p > r with u} 
pi=Ce 


The syntax of unidirectional terms include the standard cases for variables, ab- 
straction and application, along with data constructors and pattern matching. 
In addition, there is the invertible abstraction \°x.r and application u1 © u2 ex- 
plained in the previous section. Note that while the body r is an invertible term, 
the abstraction itself is unidirectional. 

The syntax of invertible terms resembles a first-order functional language, but 
with a couple of key additions. We have bijection application u or, where the 
bijection is unidirectional whereas the argument is invertible. We also have fully 
applied versions of the (-)' and pin operators explained in the previous section 
(this is without loss of generality, as e.g., the higher-order version of pin can be 
recovered as Af.A°x. pin f o x). Partially invertible branching is represented by 
the case form, whose scrutinee u is unidirectional. The case? form deconstructs 
an invertible term, and has a with-condition for invertible branching, following 
Janus and W-Lisp [7]. The core constructs of the previous section are 
all featured explicitly in the syntax, except for partially invertible composition, 
which is implicitly performed whenever a unidirectional term u occurs in an 
invertible context. 


3.2 Types 
Next, we define the types of KALPIS core. 
A,B:=TB|ASB|AGB|X 
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The types include constructors T B, functions A > B, bijections A + B and 
type variables X. The types are conventional with the exception of invertible 
computations A + B; this simplicity is a design feature of KALPIS. With each 
type constructor T we associate an arity k and a set of constructors C with 
signatures C : A; > Ag > --- > An — T B, where |B| = k. We will assume 
the type constructors include at least the unit 1, products ®, and sums © with 
constructors 


():1 (—,—):A>B> A@B InL: A> AGB InR:B> ASB 


for any A, B. We use Bool as a shorthand for 1@1, and True, False as shorthands 
for InL (), InR (), respectively. 

Types can be (mutually) recursive via constructors; for example, the type 
Nat has constructors Z : Nat and S : Nat — Nat. In general, for any fixed A, the 
recursive type X.A can be represented with a nullary type constructor Rec4, 
with constructor 

Roll : A[Rec4/X] — Rec4. 


For instance, Recjgx has constructor Roll : 1 @ Recigx — Recigx, making it 
isomorphic to Nat. Technically, we consider a variable X implicitly bound in the 
annotation to Rec, and assume all other types are closed. 


3.3 Correspondence to the Surface Language 


The correspondence between the core syntax and the examples of Section 
should be clear. For instance, the examples of addition and Fibonacci number 
calculation can be written as follows: 


add =fix (Aadd’.An.\®m. fib =fixBij (Afib’.A°n. 
case n of case® n of 
Z >m Z —>(SZ,S Z) with is11 
S n! + S (add' n' o m))) S n! > case® fib' on’ of 


(x,y) + pin add © (y, x)) 
with \_. True 
with not o is11) 


Here, add is a unidirectional term defined using a fixpoint operator fiz, and the 
structure is similar to the version presented in Section The function fib is 
similarly defined, but uses the fixpoint operator fixBij instead of fix, which works 
for bijections instead of functions. We omit the definition of is11 : Nat & Nat > 
Bool in the interest of space. The term fixBij (and analogously fiz) is defined as 
below, making use of the language’s recursive types. 


fizBij = Af. (Ag. g (Roll g)) (Av.A%a. f ((case x of Roll y > y) x) oa) 


The type system we define in the next section will assign these terms the following 
types as expected. 


add : Nat + Nat + Nat fa :((A>B)>A>B)>A>B 
fib : Nat & Nat @ Nat fizBij : ((A B) > AS B)> AGB 
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3.4 Type System 


Figure |1| shows the typing rules for unidirectional (I + u : A) and invertible 
(T;OF r: A) terms. The latter relation uses two contexts I and O; intuitively, I” 
contains variables for unidirectional data, which may be discarded or duplicated 
freely, whereas O contains variables for data that must be treated in an invertible 
way. This use of a dual context system is inspired by previous work such 
as CoreFun and SPARCL [89][40]. Formally, we define the typing contexts as 
T,O ::= € | I,x : A, and assume names x are unique within a context. We let 
I, [2 denote the concatenation of two contexts. 

The rules for | u : A are mostly straightforward. T-ABs* pushes the 
parameter x of A°xz.r into O instead of I’ to ensure that the variable is used in 
an invertible way in r, and T-RUN gives a rule for bijection application analogous 
to T-App. In the CASE rules, we implicitly require that patterns are disjoint and 
exhaustive. 

In the rules for ;9 + r : A, the variables in the O environments must be 
used exactly once to ensure invertibility. Hence, we need to separate O into, e.g., 
O = O; W Op for typing subterms, where W is used analogously to a linear type 
system (see, e.g., p). The rules follow the intuition that r denotes a bijection 
between O and A parameterized by I’. This highlights the difference between the 
pattern matching rules, T-UCASE and T-RCASE: the bound variables T; in the 
former are parameters for the bijection that r; defines, while in the latter, the 
variables O; are part of the inputs of r;, so that case? performs a composition 
of two invertible computations. 

As stated in Section there are some restrictions on how unidirectional 
and invertible terms can interact. Note that the unidirectional subterms oc- 
curring in the invertible typing rules are only typed using I’, and not O. For 
instance, since the left-hand side in rule T-RAPP is unidirectional, it cannot 
depend directly on invertible variables, ruling out terms like \°x.(ao True). This 
is a natural restriction, as we cannot generally deduce which function was used 
to produce some given result. Conversely, there is no rule for directly accessing 
I from the invertible typing relation; instead, unidirectional data can only af- 
fect the computation through rules like T-UCASE and T-RAPP. Both \-forms 
are unidirectional, meaning they can neither capture invertible variables nor be 
returned from an invertible computation. In this sense, the invertible fragment 
of the language is first-order. 

We note that there are no particular restrictions on unidirectional terms, 
and the approach presented could be used to augment any standard functional 
language with invertible computations A°z.r and u1 © ug. The prototype imple- 
mentation further adds let-polymorphism as an orthogonal extension. 


3.5 Operational Semantics 


We first define the set of values as below. 


v= CT | (Az.u, y) | (A®a.r, y) 
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Typing Rules for Unidirectional Terms | l H u: A | and Patterns| I} p:A 


ew: Aer T,x:AHFu:B rF: AB TrrFw:A 
—— ~ T-UVaR ————__—_ T-ABs T-APP 
Fra:aA TFrArvu: A+B TIF uiu: B 
T;x: AF r:B Fruy:AeB Irua:A 
—— ~ T-Apss® z : T-Run 
TrAMar: AGB IF u ous: B 
u| =]A) C:A>TB TIF ui: Ai}; 
=A í aa T-Con 
Tr+Cu:TB 
Huo: A {I;iF pi: A PT bus: B}; z|=|A| C:A>TB 
° Uit pi ik wu: Bh T-CasE zl =A] —— T-Par 
I’ case uo of {p >u} : B z:AtCE:TB 
Typing Rules for Invertible Terms | I;Q} r: A 
Fru:AeB T;OFr:A 
——_—_——— T-RVar T-RApp 
T;2:Aka:A r;OFuor:B 
Fru:AeB T;OFr:B rFu:C> ASB T;OFr:C@A 
T-INV T-PIn 
T;OFulor:A T;OF pinuor:C@B 
6|=|F| =|A| C:A>TB I; O; F ri: Ai}; 
B-ra cA {Ot nAi on RCO 
T;WOrCF:TB 
Tru:A Typ: A TDO Fii BS, 
Uik pi i ii B} T-UCASE 
T;O F case u of {p> rT}: B 
. . A gtk . / . at 2 Pe 
T;OF r9:A {Oi pi: A T;O WO; ri: B Prw: B> Bool iy ccs 


T;O w0 F case?’ ro of {p > r with u}: B 


Fig. 1. The type system of KALpis core: A —> B means A; > > A] >B. 


Here, y is a value environment, i.e., a mapping from variables to their values. 
Formally, we define y, 0 ::= 9 | y, x +> v, with y and 0 corresponding to I and 
O. We use the disjoint union 6; W02 to concatenate two environments 6; and 62, 
which is defined only when dom(6,) and dom(@) are disjoint. The values include 
constructors and two closure forms (Ax.u, y} and (A°x.r, y}, corresponding to 
unidirectional and invertible computations. We type the values in analogy with 
the terms, with the rules for closures as follows: 


y: TPjx:ArFu:B y:F T;a:Atr:B 
(Ar.u,y): A> B (Arr, y AGB 


Here, we write y : I to mean that dom(y) = dom(T`) and y(x) : r(x) for all 
x € dom(I’). For p a pattern, we write py to denote the value obtained by 
applying the substitution y to p’s variables. In addition, we use the shorthand 
i341 ie ifi= j 
False otherwise ` 

We now present in Figure P]the operational semantics of KALPIS core, which 
consists of three evaluation relations: unidirectional, forward, and backward. The 
unidirectional evaluation relation yF u J) v reads that under y term u evaluates 
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to value v, as usual. In contrast, the forward and backward evaluation relations 
define a bijection. The former relation y;0 F r = v reads that under y the 
forward evaluation of r maps @ to v, and the latter relation y;v F r < @ reads 
that under y the backward evaluation of r maps v to 0. As one can see, y serves as 
parameter for this bijection that defines a one-to-one correspondence between 0 
and v. Due to the space limitations, we omitted the rules for backward evaluation, 
as they are completely symmetric to forward evaluation. That is, for each rule of 
the forward evaluation, the corresponding backward rule is obtained by swapping 
each occurrence 7;6 F r => v with y;v F r < 0, and vice versa. Crucially, the 
evaluation relations are mutually dependent, and when a unidirectional term 
is embedded in an invertible computation, the unidirectional evaluation will be 
invoked to evaluate the term in the same way regardless of whether executing 
forwards or backwards. 

We encourage the reader to study the rules for partially invertible case 
and invertible case? especially. The former branches based on a unidirectional 
term, which is evaluated first regardless of the direction of execution. The lat- 
ter branches based on an invertible term, which is evaluated first in the forward 
direction but last in the backward direction. In the backward direction, the with- 
conditions Ñ are instead evaluated first; the condition i=j for j < i encodes the 
branch selection and the runtime check of postconditions mentioned previously. 

There is a subtlety in the backward evaluation rule for constructors C T, 
where the same C occurs both in the term C7 and the input C U, meaning that 
evaluation fails if the value does not match the constructor C. This corresponds 
to, e.g., the term (A°z. S z)' oZ failing as it tries to subtract one from zero. 


3.6 Metatheory 


In this section, we briefly state the essential properties of the core system. The 
propositions in this section have been formalized mechanically, by implementing 
and reasoning about a definitional interpreter in Agda. The implementation 
follows the presentation of the paper closely, but uses intrinsically-typed terms 
and nameless variables, and relies on the sized delay monad EN. 


Theorem 1 (Subject reduction). 
—IflFu:A,y:F andyF u} v, thenv: A. 

— fr;OFr:A,g:T, 0:0 andy;ðF r =v, thenv: A. 
— fr;OFr:A,g:I,v:A andq;vFr <9, thend: 0. 


Proof. Directly from the existence and type of the definitional interpreter in 
Agda. 


Theorem 2 (Invertibility). f r;OFr:A,y:I,0:0 andv: A, then 
y;0Fr =v ifand only if yyvF r8. 


Proof. By simultaneous induction on the term r and the step count of evalua- 
tion; simple induction on the term r is not enough as the language has general 
recursion. The proof is otherwise straightforward, since the evaluation relations 
are completely symmetric. 
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Unidirectional Evaluation | yF u 4 v 


ya) =v 
yraluvu ytru} (Aru, y) yh A xry (A xr, y) 


yku 4 Agu’) yrusdw yem vzkutu {7F ui 4 vi} 
yF uy u2 dv yFCUuycr 


yF u 4 (Ater y) yF uzo: Yiarnvub-rsSv yruoldpn PriF ui do’ 
yF urou uv yF case uo of {pT} 4 v 


Forward (and Backward) Evaluation | 7;9 F r =v ( yur 


yF uy Ofar,y) yOFrsu yreur su’ 


yOFucorsu qy; (xz =œ v)F r> 


yeud Arry bkr =v gvk r (xrv) {7; 0i F ri > vi}; 


yH utor yH CFS CB 
yeu (Aru, y) OF r => (v1, 02) 
yem vik u Y (Ayr sy) yhy vakr sus yrulpn yYWwerri sv 


y;0 F pin wor => (v1, v3) 7;0 case u of {pF} >v 


WOK ro > pibði ¥30,0,+ ri => v {yh uj 4 awn, 9) pero ku bisa} 


7;0W6'- case? ro of {p >r with u} >v’ 


Fig. 2. The operational semantics of KALPIS core. Rules for the backward evaluation 
are omitted in the interest of space, but can be derived as explained in the text. 


Remark on Progress. We have chosen to give the semantics in a big-step style 
in this paper. This choice was made both because the invertibility property is 
more natural to state about a big-step semantics, which relates input to output 
directly, and to make the step to a denotational semantics smaller—as men- 
tioned, the evaluation relations suggest an interpretation of invertible terms as 
parameterized bijections. 

Thus, the progress property typically proven for a small-step semantics, 
meaning that evaluation never gets “stuck” given a valid input (see, e.g., (45}), is 
not direct to state in our case. However, we get a similar guarantee from the im- 
plementation in Agda, whose type checker asserts that no uncontrolled run-time 
errors are possible. Indeed, the only errors that can occur during evaluation are 
those caused by imprecise with-conditions or mismatched constructors. 


4 Arrows for Partial and Local Invertibility 


While the core system of KALPIS presented in the previous section is simple 
and illuminating, it only offers an operational understanding of the language. 
Furthermore, it depends on a unidirectional evaluation, which does not fit in a 
locally invertible setting. We want to get at the essence of partially invertible 
programming, and show that partial and local invertibility can be reconciled, 
which is the focus of this section. 
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Syntax 
A,B :=1| AÐ9B| AQB |uxX.A 
TH=ASB\|A~B|C-AwB 
L = afru c| y1 Èu pe | first, u | left, u | clone | run a 
a= arr, c| ai Dr a2 | first, a | left, a | at | case! ay az | pina| w>la 


Typing Rules for Arrows |p: A~ B|and|a:C. A~ B 


c:A=B m:AwB pe:BwC wi: Aw B 
arry c: Aw B Li Èu w2: AC first, H: AQC ~ BOC 
wi: Aw B a:C-AwB 
left, uw: APC ~~ BEC clone: A~ AQA runa:C@A~B 
c:ASB ay: D-AwB ag:D-Bew Cl a:D-AwB 
arrr c: C- Aw B ai Draz: D- A~ C first, a: D- (AQC) ~ BQC 
a:D- A~ B a:C-AwB ay:C-A~B ag2:D-A~B 
left, a:D-(A®C)~ BOC al:C-BewA case! ay az: (C @ D): A~ B 
:C ~D a:D-AwB a: (C8 D). A~ B 
b>la:C-AwB pina:C-(D@A)~ DOB 


Fig. 3. The syntax and types of RRARR: A and B denote base types, r denotes com- 
binator types, c denotes bijections, u denotes unidirectional arrow combinators and @ 
denotes invertible arrow combinators. 


In what follows, we define RRARR, a low-level language based on arrow com- 
binators, intended to capture the essence of partially invertible computation. 
The operations of RRARR directly correspond to the core constructs of Sec- 
tion [8] and have an immediate interpretation in terms of abstract functions and 
parameterized bijections. What is more, we show that they have an alterna- 
tive, compositional and locally invertible interpretation using an idea similar to 
the reader monad in unidirectional computation (based on the irreversibility ef- 
fect and the reversible reader (23}). This property is not obvious for KALPIS, 
not to mention earlier work such as SPARCL [39){40]. 

We begin by explaining the syntax and semantics of a first-order fragment of 
RRARR, before proceeding to give its locally invertible intrepretation. We then 
extend this fragment to match the full expressiveness of KALPIS in Section [4.5] 
with operations for higher-order computation. In Section |5| we top it all off by 
giving a formal translation from KALPIS core to RRARR. 


4.1 Syntax and Type System of RRARR 


Figure [3] shows the syntax and type system of RRARR (where base bijections c 
of type A = B are kept abstract). The language involves unidirectional (u) and 
invertible (a) terms, similarly to KALPIS. Both kinds of terms form arrows over 
bijections, through the combinators arr, >>, and first. 

The former arrow, denoted by u : A ~~ B, intuitively represents an ordinary 
function; arr, c extracts the forward semantics of a bijection c, yı S>y H2 
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composes two functions u and p2, and first, u simply applies u to the first 
component of the input. The unidirectional arrows also feature left,,, the sum 
counterpart of first, and allow copying data through clone. 

The latter arrow, denoted by a: C - A ~~» B, represents bijections between 
A and B parameterized by C; arr, c constructs a parameterized bijection that 
behaves as the bijection c ignoring any parameter, a1 >r @2 composes the two 
bijections obtained by passing the parameter to both a, and ag, and first, a 
applies the bijection determined by a to the first component of the input. These 
arrows also support left,, and form an inverse arrow through a dagger op- 
erator at, that undoes a and its effect. 

What is special in RRARR is the communication between the two arrows 
through case!, pin, >>!, and run, where the former three directly correspond to 
the core constructs of Section [2| The term case! a, a2 performs partially in- 
vertible branching, running a; or ag depending on the value of its parameter. 
The term pin a corresponds to the pinning construct; in RRARR, this operation 
moves part of the input (D) into the parameter (C & D) of a. The term p>! a 
represents partially invertible composition of the function u with the parame- 
terized bijection a. Finally, the operator run allows converting a parameterized 
bijection C - A «~ B to a function C @ A ~ B by extracting its forward seman- 
tics. This can be seen as a special case of applying invertible computations (in a 
unidirectional context); the treatment of abstraction and application supporting 
higher-order computation is left for Section [4.5] as it requires a slight extension. 

It is worth noting that invertible arrows are inherently allowed to ignore their 
parameter (through arr,), a fact that can be used to derive the crucial erasure 
operation in unidirectional arrows. In particular, supposing id : A = A, we get 
the term run (arr, id): C @1~ 1, which ignores any input C to return (). 


4.2 Semantics of RRARR 


We now formalize the intuitive interpretation through the semantics presented 
in Figure |4| We define a base set of values containing unit, pairs, and tagged 
values, which we type in the conventional way. Recursively typed values roll w 
are only manipulated by the base invertible combinators c. 


w ::= () | (wi, w2) | inl w | inr w | roll w 


The semantics of RRARR again takes the form of three relations: one for uni- 
directional arrows and two for invertible arrows. The first (u wı > w2) reads that 
u Maps w to w2, confirming the intuition that unidirectional arrows represent 
functions. The second (aœ w; wı œ> w2) and third (@ w; w 4 w2) read that given 
parameter w, œ maps w, to wz under the forward (resp. backward) evaluation, 
confirming the intuition that our invertible arrows correspond to parameterized 
bijections. The rules closely follow the informal descriptions presented in the 
previous section. We assume a base invertible semantics for combinators c of the 
form c wı ++ w2, invoked by the rules concerning arr for each arrow. 
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Unidirectional Evaluation | y w1 w2 


c wi œ> w2 Hı W1 =œ w2 H2 W2 W3 H wi > we 
(arru c) wi œ we (pı u u2) Wi w3 (firsta u) (w1, w3) > (w2, w3) 
u wi > w2 


(lefta u) inlwi inl w2 (lefta p) inr wi inr wı clone wi > (w1, w1) 


a w; wi œ> w2 


(run a) (w, w1) > we 


Forward (and Backward) Evaluation | a w; wi œ> w2 aœ w; w1 H w2 
c wi |> w2 Q1 W; W1 > w2 a2 W; W2 > W3 a W;W1 > wa 
(arrr c) w; w1 > we (a1 Èr a2) w; w1 > w3 (first, a) w; (w1, w3) > (we, w3) 
Qa W;W1 = wa Qa WwW; w2 H w1 


(left, a) w;inl wı œ inl w2 (left, a) w;inr wi > inr wi at w; wi we 


QA, w; w1 |> we az wW; w1 |> we pwrew aw; w > w 


(case! ay a2) inl w; w1 œ> we (case! ay ag) inr w; w1 > we (u>!a) ww > we 


a (w, w1); w2 > w3 


(pin a) w; (w1, w2) + (w1, w3) 


Fig. 4. The semantics of RRARR. As before, the backward evaluation rules are sym- 
metrically obtained from the forward rules. 


The semantics satisfies the desired properties of subject reduction and in- 
vertibility, although we refer to our mechanized formalization for the details |] 


4.3 Locally Invertible Interpretation 


Recall that our goal is to define a locally invertible interpretation, whereas the 
straightforward semantics of Section [4.2|depended on a unidirectional evaluation. 
In this section, we give an alternative interpretation of RRARR, utilizing the 
reversible reader (RReader) to interpret the invertible arrow combinators. 


[C - A ~ B] = RReaderC A B 


Here, RReader C A B consists of the bijections of type C & A = C & B that 
keep the C part unchanged. This arrow was originally introduced with the in- 
tention of modelling a bijection with some “static” input C [23]. Regarding ~, 
we use the irreversibility effect that leverages the fact that every unidirec- 
tional computation can be simulated by a locally invertible computation yielding 


“garbage” [8], as: 


[A~ B]=4G.ASGeB 


Combining these two effects is a novel point of RRARR; in particular, we 
contribute the core constructs of case!, >>!, pin and run, which enable commu- 
nication between the two. Locally invertible interpretations of the primitives in 


9 https: //git.sr.ht /~aathn/kalpis-agda 
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unitely : 1@ASA assoc : AB(BOC)S (APB)OC 
swap, : A®BSBOA distr: (ABB) @®CSAQCOEBEC 
assocly : AQ (BQC) =(A8 B)8&C inl : A= ARB 

swaps : A®BSBOA roll: Al[uX.A/X] = X.A 


c:AsSB a: ASB c&2 : BoC 
id: ASA cl: BSA cage: ASC 


a:AsC a: BSD a:45C ca: BaD 
1@c:A@®BSC@D aGca:ApBBSC@D 


Fig. 5. The invertible primitives of [T° [26]. Note that we replace the looping construct 
trace with the derived inl for simplicity (Section [4.5] recovers the expressiveness of this 
combinator). 


each system have been given in the existing results. Here, we extend the results 
with the operations novel to RRARR, to show that the two systems together give 
a locally invertible model of partially invertible computations. 

As our target invertible language, we use [T° [26], whose combinators c consti- 
tute a minimal set of (non-total) invertible operations. The combinators support 
sequential composition (cı 3 c2), parallel composition (cı ® c2 and cı È c2), and 
importantly, a local inversion operator (ct) such that (c1 $ c2) = OETAN Figure|5| 
shows a summary of the primitives; their behavior should be obvious from the 
types (see the Agda formalization for details). 

We now proceed to give another interpretation of the core constructs of 
RRARR. 


Partially invertible branching. Given a; and a2 with [ai] : CQA = C8B 
and [a2] :D® A = D®B, we must construct 


[case! ay a2] : (CBD) 9 A= (COD) SB. 


Using distr, we can convert (C@D)@A to C@A®D@A, after which [a1] and [a2] 
can be run in parallel. Factoring out the B, we get the required transformation. 


[case! a1 a] = distr 3 [a4] © faz] 3 distr 
Pinning. Given a with [a] : (C 8 D) 8 A = (C 8 D) 8 B, we must produce 
[pin a]: C 89 (D8 A) =C 8 (D8 B). 


As the reversible reader arrow [a] already returns the context C unchanged, we 
only need to shuffle the inputs and outputs appropriately. 


[pin a] = assoclx $ [a] 3 associ, 


Partially invertible composition. Given u and a with [u] : C = G@D and 
la]: D A= D8 B, we must construct 


lu>! a]: C94A=C8B. 
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The basic idea is to run [ju] to produce a D-typed value to run [a] on, however, 
this brings with it unwanted garbage. Fortunately, since [a] is a reversible reader 
arrow, it is guaranteed to preserve the D-component, meaning that after running 
it we have the same D and G-values available to us as before. These can be 
turned back into the original C value by running [jy] backwards, giving the 
transformation required. 


[e >! a] = 
lu] 8 ids assocl!, 3 id Q [al] 3 assocly 3 [u]? @ id 


Note that this is precisely the construction underlying the reversible updates |5| 
of imperative reversible languages, and that [a] preserving the context is crucial 
for the construction to succeed. 

Running invertible computations. Given a with [a] : C 8 A = C & B, we 
must produce 


[run a]: Ce®ASGOB, 


for some G. Clearly it suffices to take [a] with G = C, and we are done. 


4.4 Correctness 


We now state the desired correctness properties of our locally invertible inter- 
pretation, which show that it is equivalent to the direct semantics of Figure 
and that [a] is indeed a reversible reader arrow (t.e., it preserves the context 


C). 


Theorem 3 (RRARR --> II° Soundness). 
— u wı we implies |u] wi (g, w2) for some g. 
— a w; wı > we implies [a] (w, w1) > (w, we). 


Theorem 4 (RRARR --> II° Completeness). 
— [u] wi (g, we) implies u wı > we. 
— fa] (w, wi) => (w, we) implies w = w’ and a w; wı > w2. 


The theorems do not refer to the backward evaluation directly, utilizing the 
invertibility of both RRARR and M°. 


4.5 Higher-order Computation 


The previous sections laid out the fundamental ideas for representing partial 
invertibility in a locally invertible setting. However, with RRARR being first- 
order, it is not sufficient to be able to interpret KALPIS in a simple way. In 
this section, we extend the language with four new combinators enabling proper 
higher-order computation, shown in Figure [6] The combinators curry and app 
are the standard currying and evaluation maps, creating and applying functions 
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A,Bu=---|A>+B|AGB 
L == --- | curry u | app | curry? a a=- | app? w = --- | (u, w) | (œ, w) 
u:CQA~B a:C- A~ B app : (A > B)& A ~ B 


curry p: C ~ (A> B) curry? a:C ~ (Ae B) 


app? : (A & B)- A~ B 


(curry u) w > (u, w) u (w, wi) > we Qa w; wi œ> w2 


app ((u, w), w1) => w2 app® (a, w); w1 > w2 


(curry? a) w > la, w) 


Fig. 6. Combinators for higher-order computation in RRARR. 


A — B. Their invertible counterparts curry? and app® provide the final core 
construct from Section} abstraction and application of invertible computations. 
They operate over parameterized bijections, abstracting the parameter to get a 
bijection value A + B. The values are extended accordingly with two new 
closure forms (j,w) : A > B and (a,w) : A © B, where y: C@A ~ B, 
a: C. A~ B, and w : C, representing staged unidirectional and invertible 
computations, respectively. 

Having higher-order computation in the invertible setting has been challeng- 
ing (2)[1.2)[39] 40]. Borrowing the idea from (39||40], we address the issue by lever- 
aging the fact that the function and bijection values are only part of invertible 
computations as parameters of parameterized bijections; hence, we only need a 
limited form of higher-orderness. We extend I° with two additional primitive 
operations: 


currys : (C8 A= C8 B) > (CSC@(A¥H B)) 

app= :((A B)@A)S ((AS B)8B) 
The former takes a combinator with an auxiliary piece of “state” C’, and abstracts 
it into a bijection given a value of C. The latter applies a bijection, and saves it 
to enable reversing the operation later. To represent the values of type A © B in 


IT°, we introduce a third form of closure (f, w), where we have f : CQA = C@B 
and w : C. Then, the semantics of app and curry are as follows: 


clos = (f,w) f (w,a) => (w’,d) 
(curry= f) Wt (w, clos) apps (F, w), a) E? (F, w’), b) 


As before, the inverse semantics is symmetric; e.g., (curry f)' (w, clos) => 
w if clos = (f,w). The (non-total) invertibility of curry is trivial, as its inverse 
fails unless its input matches the corresponding output; it is essentially a unidi- 
rectional function embedded in the invertible world. Since observational equality 
of closure values is undecidable, the equality check must rely on some other, in- 
tensional (e.g., syntactic) equality. Practically, this means that the combinator 
can only be used to create a closure and then subsequently undo the very same 
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closure. However, this does not pose an issue for the translation from RRARR, 
where closures will only result from uses of curry and curry®, both of which are 
unidirectional arrows (~~). These unidirectional arrows will only be executed 
backwards as part of partially invertible compositions (>>!), which ensures that 
the input is the same as the corresponding output. 

Now, we can interpret [app] = app, |app*] = app—, and 


[curry u] = inl; curry. (inl? @ id 3 [ull 3 inr @ id), [curry® a] = currya fa]. 


The former construction curries [yu] : C 8 A = G 8 B given w : C by creating 
a one-shot closure (f,inl w) which turns into (f,inr g) for g : G when first 
applied, and fails on a second application. 

The theorems of Section [4.4] extend without difficulty to the higher-order 
combinators, although the statement is somewhat more intricate due to the dif- 
fering set of closure values between RRARR and °. We refer to the mechanized 
formalization in Agda for details. 


5 Interpreting KALPIS with Arrows 


Theorem|1] (Section 3.6) suggests that a unidirectional term-in-context I} u: A 
can be seen as a function from I’ to A, and that an invertible term-in-context 
r;O© tr: A can be seen as a bijection between O and A parameterized by 
I’. Then, it is natural that they be related with the two arrows (— ~ —) and 
(—-— «~ —) of RRARR, respectively. In this section, we give a formal account 
of this relation by translating terms of KALPIS into RRARR, giving by extension 
a compositional locally invertible interpretation of KALPIS. 
We first define some operations on typing contexts. We define T% as 


(v1: Ai,-.-,2n: An)” = (((1@ A1) @ Az) @--+) @ An. 


It is straightforward to define an operator lookup, : T% ~> A provided that 
T(x) = A. We also use a combinator splito, o, : (919802)* = OF ®OF for split- 
ting the linear environments. Then, we give two type-directed transformations: 
I''u:A--+ p that transforms u to u of type T ~ A, and l’;OF r: A--+a 
that transforms r to a of type '*-O%* «~ A. For the purposes of the translation, 
we consider a fixed set of type constructors T B::=1|A@®B|A@B| Rec4, 
identifying wX.A with Rec4. 

Without loss of generality, we drop unnecessary with-conditions, so that 
a case®-expression with one branch needs no with-clause, and one with two 
branches needs only one clause. Due to the space limitations, we present only the 
most representative cases here, and point the interested reader to the mechanized 
formalization in Agdaf?] 


10 lnttps://git.sr.ht /~ aathn /kalpis-agda 
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Case T-UCASE (A @ B). 
TFru:A@B--> pu 
I,x:A;OF 1: C --> ay Iy: B;OF rg: C --+ ag 


T; O F case u of InL a > rı; InR y > rg: C --> 
(clone S>u firsta U Èu arry (swap, 3 distl)) >>! case! ay ag 


We can duplicate I% using clone and use one copy to construct A ® B with 
u. Using distl : AQ (BOC) = AQBGPAQ C, which is easily derived, we 
distribute the second copy of I’ over the sum. Then, the required combinator can 
be constructed through a combination of partially invertible composition (>>!) 
and branching (case!), where we have case! a; az : (T* @AGI™ @B)- Ow C. 


Case T-RCASE (A @ B). 


T;O;- r,:ASBB--+ a, I; O2,%£: AF r2 : C --+ ag 
I; O2,y : BEF r3: C --> ag I Hu: C — Bool --> u 
T; O1 ® OF case? rı of InL z > r2; InR y > r3 with u : C --> 
arrr splito, o, r first, a1 r arry (swap, 3 distl) >>, 
case a2 a3 (mkCond p) 


The idea is similar to T-UCASE, but we now operate in the invertible world, so 
we split (O1 w O2)* instead of duplicating T’, and compose using >>, instead of 
>!. The combinator case a1 a2 a3 £ left, a1 r, right, ag >r al with type 


case: (D - A ~ C) > (D- B ~ C) > (D-CwCeC) > D- (AGB) ~C, 


provides an invertible branching operator analogous to case!, with a postcondi- 
tion for merging the branches. We convert u : T% ~» (C — Bool) to an arrow 
mkCond u : T -C «~ C @C through the mkCond operator, which can be 
defined using pin, case! and app in tandem. 

Cases T-Aps®, T-RAPP. 


T;x:Atr:B--+a Fru:AGB--> p T;OFr:A--+a 


Tt Mar: AG B--> T;@Fucor: B--+ a >>; (u>! app*) 
curry? (arrr unitel} >> a) 


For T-ABS°, we get a: [*-1@.A «~ B, which we curry® after handling the unit. 
For T-RAPP, a transforms 0% to A, letting u be applied through a partially 
invertible composition (>>!) with app®. 


Case T-PIN. 
TPru:Cr>AGQB-->+y T;OFr:C@A--+a 
T;OF pinuor:C®@B -->+ a>, pin ((first,, u u app) >! app*) 


We have a producing C & A, and with parameter T> & C, we can apply p to 
produce B. Thus, we must shift C from the output into the parameter, and pin 
achieves just that. 
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Correctness. Finally, we show the correctness of the translation with respect to 
the semantics of Sections [3.5]and [4-2] Before we state correctness, we must first 
define a translation of the values, since they differ between KALPIS and RRARR. 


LOJ=0, [er, v2) = (fv, [v2], 
[inL v] = inl [vo], [InR v] = inr [v], [Roll v] = roll [v], 


[(Avu,7)] = (el DI, [A%2-r,y)] = (arr unite >>, [r], DI) 


The base values are translated trivially, whereas the closures are translated ac- 
cording to the type-directed translation given above (cf. Case T-ABs°). We also 
define a translation of value environments y in the obvious way. 

Then, we can state the correctness of the translation as below. 


Theorem 5 (KALPIS --> RRARR Soundness). 
—Tru:A--+p andy ull v implies u [y] => lo] 
—T;O@tr:A--+a andy;0F r= v implies a [y]; [0] > [o]. 


This theorem does not refer to the backward evaluation directly, utilizing the 
invertibility of both KALPIS and RRARR. The completeness part, on the other 
hand, does need a separate statement for the backward direction, since there is 
no a priori guarantee that the output w is of the form [6]. 


Theorem 6 (KALPIS --> RRARR Completeness). 
—Pru:A-->+ pw and u |y] = w implies yF ul v for v with [v] = w. 
—-T;OF r: A --+ a anda [y];[6] => w implies y;0 F r => v for v with 

lv] = w. 
—-T;OF r:A--> a anda |y]; [lv] 4 w implies y;v F r < 0 for 0 with 
lo] = w. 


We refer to the Agda code in the supplementary material for the proofs. 


6 Related Work 


KALPIS and RRARR are not the first to support partial invertibility. In the imper- 
ative setting, languages such as Janus [85153], Frank’s R [17], and R-While 
support a limited form of partial invertibility via reversible update operators |6]. 
An example of a reversible update statement is x += e, whose effect can be re- 
verted by the corresponding inverse statement x -= e. Both statements use the 
same e, which need not be invertible (e.g., x += yz is reverted by x -= yz, and 
vice versa). In the functional setting, Theseus allows a bijection to take ad- 
ditional parameters, but only provided that they are available at compile time. 
RFun version 2/7 Jan extension to the original RFun (54), and CoreFun allow 
more flexibility via so-called ancilla parameters, which are translated to auxiliary 
inputs and outputs of the invertible computation. Their approach is similar to 
KALPIS’s but more restrictive since they lack support for the pin operator and 


1 https: //github.com/kirkedal /rfun-interp 
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higher-order computation. Jeopardy is a recent invertible language where 
even irreversible functions can be inverted in certain contexts depending on im- 
plicitly available information. However, this is still work in progress, and seems 
to lean closer to program inversion methods than the lightweight type-based 
approach we employ. 


SPARCL is the most flexible system that supports partial invertibility 
to our knowledge, which is realized through a more advanced language founda- 
tion. Instead of bijections A + B, SPARCL features invertible data marked by 
the type A°, which implicitly corresponds to some bijection S + A. This idea 
of invertible data is inherited from the HOBiT language [38], which represents 
lens combinators as higher-order functions to achieve applicative-style 
higher-order bidirectional programming (36][37]. The type system of SPARCL 
ensures that a closed linear function between invertible data !(A° — B®) is iso- 
morphic to a (non-total) bijection between A and B, so that partial invertibility 
can be represented as a function that takes both unidirectional and invertible 
data C — A° — B°. This representation affords more flexibility than KALPIS 
does: invertible data is allowed to be captured in abstractions, and can even 
appear in subcomponents of datatypes (e.g., Int Q (Int®) or Int @ (Int?) are both 
valid types). However, this flexibility comes at the cost of complexity, requiring a 
semantics that interleaves partial evaluation and invertible computation, making 
a locally invertible interpretation difficult. We remark that the holed residuals 
(c.f) featured in SPARCL’s core system bear a strong resemblance to bijections 
A®°az.r in KALPIS. 


Our combinator language RRARR can be seen as an extension of MLy, an 
arrow metalanguage on top of the invertible language I treating information 
creation and loss (non-totality and irreversibility) as an effect 226]. By combin- 
ing their work with the reversible reader arrow [23], we are able to give erasing 
(weakening) as a derived operation defined via the operator run (as demon- 
strated in Section [4}. Further research on the nontrivial interaction between the 
arrows, such as an equational characterization and a denotational model, is left 
for future work. While the previous work is able to treat non-totality as part 
of an effect, we assume some non-total operations in the underlying invertible 
system due to the inclusion of recursive and functional types. 


The design of KALPIS is inspired by the arrow calculus of Lindley, Wadler, 
and Yallop [83], which is a metalanguage for the conventional representation of 
arrows [24], analogous to the monad metalanguage |42|. In a sense, KALPIS can 
be seen as a counterpart to the arrow calculus for RRARR. For example, the treat- 
ment of A°x.r is actually inherited from the arrow calculus, where arrows cannot 
be nested in general [84], unless the underlying arrow supports application to 
form a monad . To the best of our knowledge, a monad-based programming 
system for invertible/reversible computation does not exist, though there are 
some closely related results, including monads for nondeterministic computation 
(such as |14|) and a monadic programming framework for bidirectional trans- 
formations 20] [52]. However, these existing approaches lack the guarantee of 
bijectivity—a motivation to use invertible languages. 
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The importance of partial invertibility has been recognized in the neighboring 
literature on program inversion—program transformations that derive a program 
of f—! for a given program of f. Partial inversion essentially applies a 
binding-time analysis to an input program, where the static data can 
be treated as unidirectional inputs. The technique is further extended to treat 
results of inverses as unidirectional 29][80]. This treatment is similar to the 
role of pin in KALPIS and SPARCL in that it converts invertible data into 
“static” parameters. Some approaches to program inversion are more liberal: semi 
inversion essentially converts a program into a logic program, where there 
is no clear boundary between unidirectional and invertible data, and the PINS 
system [9], in addition to an original program, can take a control structure of 
an inverse program to effectively synthesize inverses that may not mirror the 
control structures of the original. The main limitation of program inversion is 
that as a program transformation it may fail, often for reasons that are not 
obvious to programmers. 


7 Conclusion 


We have presented a set of four core constructs for partially invertible program- 
ming, demonstrated their expressiveness through examples, and shown that they 
can be given a locally invertible interpretation, thus solving an open problem in 
the field. The four constructs are (1) partially invertible branching, (2) pinning 
invertible inputs, (3) partially invertible composition, and (4) abstraction and 
application of invertible computations. We designed the partially invertible lan- 
guage KALPIS on top of these constructs and formalized its syntax, type system 
and operational semantics. We then presented RRARR, a low-level arrow lan- 
guage with primitives directly corresponding to the constructs, and gave it a lo- 
cally invertible interpretation based on two effects—the irreversibility effect 
and the reversible reader [23]. Finally, we presented a type-directed translation 
from KALPIS to RRARR, showing how to support expressive partial invertibility 
on top of a locally invertible foundation. Proofs of all theorems stated in the 
paper are formalized by the accompanying Agda codef?| 
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Abstract. Regular expression (regex) matching is fundamental in many 
applications, especially in web services. However, matching by backtrack- 
ing—preferred by most real-world implementations for its practical per- 
formance and backward compatibility—can suffer from so-called catas- 
trophic backtracking, which makes the number of backtracking super- 
linear and leads to the well-known ReDoS vulnerability. Inspired by 
a recent algorithm by Davis et al. that runs in linear time for (non- 
extended) regexes, we study efficient backtracking matching for regexes 
with two common extensions, namely look-around and atomic grouping. 
We present linear-time backtracking matching algorithms for these ex- 
tended regexes. Their efficiency relies on memoization, much like the one 
by Davis et al.; we also strive for smaller memoization tables by care- 
fully trimming their range. Our experiments—we used some real-world 
regexes with the aforementioned extensions—confirm the performance 
advantage of our algorithms. 


Keywords: regular expression : look-around - atomic grouping : pat- 
tern matching - ReDoS - memoization 


1 Introduction 


Regex Matching Regular expressions (regexes) are a fundamental formalism 
for various pattern-matching tasks. Many regex matching implementations, how- 
ever, suffer from occasional super-linear growth of their execution time. Such ex- 
cessive execution time can be exploited for DoS attacks—this is a vulnerability 
called regex denial of service (ReDoS). ReDoS is recognized as a significant se- 
curity concern in many real-world systems, especially web services such as Stack 
Overflow and Cloudflare (see §2.4 for more details). 


Need for Efficient Backtracking Regex Matching The principal cause 
of ReDoS is catastrophic backtracking, that is, the explosion of recursion in a 
backtracking-based matching algorithm. 
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In regex matching, in general, a regex r is converted into a non-deterministic 
finite automaton (NFA) A, and the latter is executed for an input string w. The 
non-determinism of A can be resolved in either a depth-first or a breadth-first 
manner. The former is called backtracking regex matching, and the latter is the 
on-the-fly DFA construction. 

Catastrophic backtracking and ReDoS are phenomena unique to the former 
(i.e., backtracking)—as is well-known, the time complexity of the on-the-fly DFA 
construction is linear (i.e., O(|w])). Indeed, many modern regex implementations 
are based on the on-the-fly DFA construction, including RE2°, Go’s regexp?*, 
and Rust’s regex’. 

It is practically essential, however, to make backtracking regex matching 
more efficient. A principal reason is consistency. Most existing regex matching 
implementations use backtracking, and they return only one matching position 
out of many (see §2.3). While it is possible to replace them with on-the-fly 
DFA matching, it is non-trivial to ensure consistency, that is, that the chosen 
matching position is the same as the original backtracking matching implemen- 
tation. NET’s regex implementation has a linear-time complexity backend using 
a derivative-based approach, which is compatible with a backtracking backend. 
Still, it does not support look-around and atomic grouping [28]. Once the re- 
turned matching position changes, it can unexpectedly affect the behavior of all 
the systems (e.g., web services) that use regex matching. 

Another reason for improving backtracking regex matching is its extensibility. 
There are many extensions of regexes widely used—such as the ones we study, 
namely look-around and atomic grouping—and they are supported by few on- 
the-fly DFA matching implementations. 


Existing Work: Linear-time Backtracking Matching with Memoization 
Memoization is a well-known technique for speeding up recursive computations. 
The recent work [10] shows that memoization can be applied to backtracking 
regex matching with consistency in mind. Specifically, the work [10] presents 
a backtracking matching algorithm that runs in O(|w|) time—thus, it is the- 
oretically guaranteed to avoid catastrophic backtracking—for regexes without 
extensions. (They also mention application to extended regexes in [10], but we 
found issues in their discussion—see Remark 2). 


Our Contribution: Linear-time Backtracking Matching for Some Ex- 
tended Regexes In this paper, we present a linear-time backtracking match- 
ing algorithm for regexes with look-around and atomic grouping, two real-world 
extensions of regexes. It uses memoization in order to achieve a linear-time com- 
plexity. We also prove that it is consistent (i.e., it chooses the same matching 
position as the original algorithm without memoization). 

The technical key to our algorithm is the design of suitable memoization ta- 
bles. We follow the general idea in [10] of using memoization for backtracking 
matching, but our examination of its issues with extended regexes (Remark 2) 


3 nttps://github.com/google/re2 
4 nttps://pkg.go.dev/regexp 
5 https: //docs.rs/regex/latest/regex/ 
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shows that the range—i.e., the set of possible entries—of memoization tables 
should be suitably extended. Specifically, the range in [10] is {false}, record- 
ing only matching failures; it is extended in our algorithm to {Failure(j) | j € 
{0,...,v(A)}}U {Success}. Here, v( A) is the maximum nesting depth of atomic 
grouping for the (extended) NFA A, defined in §5. 

Our development is rigorous and systematic, based on the notion of NFA 
whose labels can themselves be NFAs. This extended notion of NFA is suggested 
in [10, Section IX.B]; in this paper, we formalize it and build its theory. 

We experimentally evaluate our algorithm; the experiment results confirm its 
performance advantages. Additionally, we survey the usage status of look-around 
and atomic grouping—two regex extensions of our interest—in real-world regexes 
and demonstrate their wide usage (§6). 


Technical Contributions We summarize our technical contributions. 


— We propose a backtracking matching algorithm for regexes with look-around, 
proving its linear-time complexity (§4). This algorithm fixes the issues in the 
algorithm in [10] (Remark 2) and restores correctness and linearity. 

— We also propose a backtracking matching algorithm for regexes with atomic 
grouping, proving its linear-time complexity (§4). 

— We experimentally confirm the performance of our algorithms (§6). 

— We investigate the usage status of look-around and atomic grouping in real- 
world regexes and confirm their wide usage (§6). 

— We establish a rigorous theoretical basis for our algorithms for extended 
regexes, namely NFAs with sub-automata (§2.6). 


Organization We provide some preliminaries in §2, such as regex extensions 
of our interest. Our formalization of NFAs with sub-automata is also presented 
here. In §3, we discuss the work [10] that is closest to ours. We present our 
matching algorithm for regex with look-around in §4 and the one for regex with 
atomic grouping in §5. Then, we discuss our implementation and experimental 
evaluation in §6. We conclude in §7. 

Some additional proofs and other materials are deferred to the appendices in 
the extended version [15]. 


Related Work Many related works are discussed elsewhere in suitable contexts. 
Here, we discuss other related works. 

There are many theoretical studies on look-around and atomic grouping. The 
work [27] is a theoretical study of look-ahead operators; it shows how to convert 
them to finite automata. Another conversion based on derivatives is introduced 
in [26]. The work [3] conducts a fine-grained analysis of the size of DFAs ob- 
tained from converting regexes with look-ahead, improving the bounds given 
in [26,27]. The work [5] discusses the relation between look-ahead operators and 
back-references in regexes. A recent study [22] presents a linear-time matching 
algorithm for regexes with look-around; it uses a memoization-like construct for 
efficiency. However, the compatibility with backtracking is not a concern there, 
unlike the current work. On atomic grouping, conversion to finite automata is 
proposed [4], where atomic grouping is simulated by look-ahead. 
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Another common regex extension is back-reference. We do not deal with this 
extension because 1) this extension is known to be non-regular (i.e., the language 
class defined by back-reference is beyond regular), and 2) its matching problem 
is known to be NP-complete [1] (thus the search for linear-time matching is 
doomed). There are other extensions (absent operators, conditional branching, 
etc.), but they are used less often (cf. §6). 

ReDoS countermeasures are an active scientific topic. Besides efficient match- 
ing, there are two directions for them: ReDoS detection and ReDoS repair. Re- 
DoS detection is a problem that determines whether a given regex can cause 
catastrophic backtracking. This can be done by finding specific structures in a 
transition diagram of an automaton [2, 18,29, 34,36, 37]. Besides, dynamic anal- 
ysis, such as fuzzing [31], and combinations of static and dynamic analyses [19] 
are studied. ReDoS repair is a problem of modifying a given regex so that it 
does not cause ReDoS. Known solutions include exploring ReDoS-free regexes 
using SMT solvers [6,21] and rule-based rewriting of vulnerable regexes [20]. 
These ReDoS detection and repair measures are computationally demanding, 
and their real-world deployment is limited. 

There are other implementation-level studies on speeding up regex matching, 
such as Just-in-Time (JIT) compilation [17] and FPGA [32]. However, these 
studies are not intended to prevent catastrophic backtracking. 


2 Preliminaries 


We introduce preliminaries for this paper. Firstly, we present some basic con- 
cepts such as regexes, NFAs, conversion from regexes to NFAs, and backtracking 
matching. We then discuss catastrophic backtracking and the ReDoS vulnerabil- 
ity that it can cause. Finally, we introduce look-around and atomic grouping as 
practical regex extensions and NFAs with sub-automata for these extensions. 

We fix a finite set X as an alphabet throughout this paper. We call sequences 
of elements of X strings. The empty string is denoted by e. For a string w = 
0001---On—1, the length of w, denoted by |w], is defined as |w| = n. We also 
write w[i] = c; for i € {0,...,n—1}. 

We use partial functions for memoization. For two sets A and B, a partial 
function G from A to B, denoted by G: A — B, is defined as a function G: A > 
BU{L}. Here L is the element for “undefined,” and it is assumed that L ¢ B. 

Let G: A — B be a partial function, a € A, and b € B. We let G(a) + b 
denote an updated partial function: it carries a to b, and any other x € A to 
G(x) (it is undefined if G(x) is initially undefined). 


2.1 Regexes 
Regular expressions (regexes) are defined by the following abstract grammar. 


ru= ø (a (literal) character, where o € X) | e (the empty string) 
| r 


r|r (an alternation) -r (a concatenation) 
r* 


(a repetition) 
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OKO Eps Eps 
=D ti r2 j> 
oEX 


Tı 


O 


Ta 


(d) rilre 


Fig. 1: a conversion from regexes to NFAs 


The concatenation operator - may be omitted when there is no ambiguity. The 
precedence of operators is as follows: repetition, concatenation, and alternation. 
For example, ab*|c means (a - (b*))|c. 

For a regex r, the size of r, denoted by |r], is defined as follows: |o| = |e| = 1, 
(ralra)| = [r1 - re] = [ral + [ra] + 1, and |r*| = |r| + 1. 


2.2 NFAs 


A non-deterministic finite state automaton (NFA) is a quadruple (Q, qo, F, T), 
where Q is a finite set of states gg € Q is an initial state, F C Q is a set 
of accepting states, and T is a transition function. For each q € Q \ F, T(q) 
can be one of the following: T (q) = Eps(q’), T (q) = Branch(q’,q”), and T (q) = 
Char(o, q’) where q',q” E€ Q and o € X. 

The above definition of a transition function T is tailored to our purpose of 
backtracking. Compared to the common definition 6: Q x ({e} U X) > 22, it 
expresses general branching as combinations of certain elementary branchings. 
The latter is namely one transition by £, two transitions by £, and one transi- 
tion by a certain character o € X. This makes the description of backtracking 
matching easier. Note, in particular, that the successors q’,q” in the branch- 
ing Branch(q’,q’’) are ordered. Here, q’ and q” are called the first and second 
successors, respectively. This definition of transition functions is similar to the 
op-codes of many real-world regex-matching implementations (cf. [8]). 

We present a conversion from regexes to NFAs (see Figure 1); it is similar to 
the Thompson—McNaughton-Yamada construction [23,35]. For a regex r, A(r) 
denotes the NFA A converted from r. In the figure, labels on arrows show kinds 
of transitions. In a Branch transition, the top arrow points to the first successor, 
and the bottom points to the second successor. Rectangles indicate that the 
conversion is applied to sub-expressions inductively. Because each case of this 
construction introduces at most two new states, for a regex r and the NFA 


A(r) = (Q, 40, F, T), we have |Q| = O((r|). 
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Algorithm 1 a partial backtracking matching algorithm for NFAs 


1: function MATCH ,w (q, i) 
Parameters: an NFA A, and an input string w 
Input: a current state q, and a current position 7 
Output: returns SuccessAt(i’) if the matching succeeds, or 
returns Failure if the matching fails 
(Q, q, F, T) =A 
if q € F then return SuccessAt(?) 
else if T(q) = Eps(q’) then return MATCHA,w(q', i) 
else if T(q) = Branch(q', q”) then 
result + MATCH 4,w(q’, i) 
if result = Failure then return MATCH s,w(q", i) 
| else return result 
else if T(q) = Char(o,q’) then 
if i < |w| and w/i] = o then return MATCH 4,u(q’,i+ 1) 
else return Failure 


Ka > OTOA LOS) Us ODS 


= = 


We collectively call Eps and Branch transitions ¢-transitions. Later in this 
paper, if there are consecutive ¢-transitions, they may be shown as a single 
transition in a figure. When a certain state returns to itself by e-transitions, 
such a sequence of ¢-transitions is called an €-loop. -loops are problematic in 
matching because they cause infinite loops in matching. 

An ¢-loop can be detected during matching by recording a position on an 
input string when a state is visited. When an ¢e-loop is detected, several solutions 
exist to deal with it (see, e.g., [30]), such as treating an ¢-loop as a failure (e.g., 
JavaScript and RE2) or treating it as a success but escaping it (e.g., Perl). These 
solutions can be easily adapted to our algorithms; therefore, for the simplicity 
of presentation, we introduce the following assumption. 


Assumption 1 (no e-loops). NFAs do not contain ¢-loops. 


2.3 Backtracking Matching 


We present a basic backtracking matching algorithm for NFAs in Algorithm 1. 
It serves as a basis for optimization by memoization, both in [10] and in the 
current work. 

The function MATCH, ,,, is recursively called in this algorithm, but it must 
terminate on Asm. 1. It takes two parameters: A is an NFA, and w is an 
input string. It also takes two arguments: q € Q is the current state, and 
i € {0,...,|w]} is the current position on w. MATCH, (qo, i) for an NFA A = 
(Q, qo, F, T) returns SuccessAt(2’) with the matching position i’ € {0,...,|w|} if 
the matching with A succeeds from iż to i’ on w, or returns Failure if the matching 
fails. 

The MATCH function implements partial matching: given the position i € 
{0,...,|w|} of interest, one obtains, by running MATCH A,w(qo, i), one “matching 
position” i’ (if it exists) such that wfi] w[i+1] ...wli’] is accepted by A. Note the 
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Fig. 2: the NFA A((ala)*b) 


difference from total matching: given A and w, it returns true if (the whole) w is 
accepted by A and false otherwise. The practical relevance of partial matching 
must be clear, as we can use it for text search and replacement. 

Lines 5 to 8 in Algorithm 1 perform matching for Branch transitions. Here, 
the algorithm first tries matching from the first successor q’, and if that fails, 
it tries matching from the second successor q” with the same position. This 
behavior is called backtracking. 

We define the regex partial matching problem using the function MATCH. 


Problem 1 (regex partial matching). 
Input: a regex r, an input string w, and a starting position i € {0,...,|w|} 
Output: returns MATCH A(r),w(q0, i) where A(r) = (Q, qo, F, T). 


Remark 1. One can say that the problem formulation is a bit strange. It requires, 
as output, a specific matching position chosen by a specific algorithm MATCH, 
while a usual formulation would require an arbitrary matching position. We take 
this formulation since we aim to show that our optimization by memoization not 
only solves partial matching but also is consistent with an existing backtracking 
matching algorithm, in the sense we discussed in §1. We formulate consistency 
as correctness with respect to Prob. 1, that is, preserving the solution chosen by 
the specific algorithm MATCH. We also note that the algorithm MATCH mirrors 
many existing implementations of regex matching (cf. §2.2). 


2.4 Catastrophic Backtracking and ReDoS 


In the execution of the MATCH function (Algorithm 1), depending on an NFA 
A and an input string w, the number of recursive calls for the MATCH function 
may increase explosively, resulting in a very long matching time, as we will see 
in Example 1. This explosive increase in matching time is called catastrophic 
backtracking. 


Example 1 (catastrophic backtracking). Consider the NFA A = A((aja)"b) = 
(Q,q0, F,T) shown in Figure 2, and let w = "a"c" (the string repeating a of n 
times and ending with c) be an input string. MATCH, (qo, 0) invokes recursive 
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calls O(2") times until returning Failure. The reason for this recursive call ex- 
plosion is to try all combinations on q2 to q3 and q4 to q5 transitions for each a 
in w during the matching. 


Regexes denial of service (ReDoS) is a security vulnerability caused by catas- 
trophic backtracking. In ReDoS, catastrophic backtracking causes a huge load 
on servers, making them unable to respond in a timely manner. There are cases 
of service outages due to ReDoS at Stack Overflow in 2016 [12] and at Cloudflare 
in 2019 [16]. Additionally, a 2018 study [33] reported that over 300 web services 
have potential ReDoS vulnerabilities. Thus, ReDoS is a widespread problem in 
the real world, and there is a great need for countermeasures. 

According to a 2019 study [25], only 38% of developers are aware of Re- 
DoS. This study also found that many developers find it difficult not only to 
read regexes but also to find and validate regexes to match their desires. It is 
mentioned in [25] that developers use Internet resources such as Stack Overflow 
to find regexes. In recent years, it has also become common to use generative 
Als such as ChatGPT for such a purpose. However, when the authors asked, 
“Please suggest 10 regexes for validating email addresses” to ChatGPT,°® 2 of 
the 10 suggested regexes would cause ReDoS (see Table 1). Developers may un- 
knowingly use such vulnerable regexes. For this reason, it is important to develop 
ReDoS countermeasures that can be achieved without the developer being aware 
of them. 

Matching speed-up is a way to avoid causing ReDoS by ensuring that match- 
ing is linear in time to the length of an input string, freeing developers from wor- 
rying about ReDoS. A popular method for matching speed-up is using breadth- 
first search for non-deterministic transition instead of backtracking (depth-first 
search); it is called the on-the-fly DFA construction [7,28]. However, since look- 
around and atomic grouping are extensions based on backtracking (see §2.6), it 
is not obvious that they can be supported by the on-the-fly DFA construction. 

Memoization is another approach to ensuring linear-time backtracking match- 
ing; we pursue it in this paper. 


2.5 Regex Extensions: Look-around and Atomic Grouping 


Many real-world regexes come with various extensions for enhanced expressiv- 
ity [13]. In this paper, we are interested in two classes of extensions, namely 
look-around and atomic grouping. 
Look-around Look-around is a regex extension that allows constraints on 
strings around a certain position. It is also called zero-width assertion (e.g., 
in [10]) because it does not consume any characters. Look-around consists of 
four types: positive or negative, and look-ahead or look-behind. 

Positive look-ahead is typically represented by the syntax (?=r); its matching 
succeeds when, reading ahead from the current position of the input string, the 


° We used ChatGPT 3.5 (September 25, 2023 version). 
7 The second and third regexes are the same; they are the actual output of ChatGPT. 
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Table 1: the regexes given by ChatGPT for the question “Please suggest 10 


regexes for validating email addresses”.” 


suggested regex (ChatGPT’s comment) vulnerable? 


*[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\. [a-zA-Z] {2,}$ 

(Basic Email Validation) 
*[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\. [a-zA-Z]{2,4}$ 

(Basic Email Validation with TLD) 
*[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\. [a-zA-Z] {2,4}$ 

(Strict Email Validation) 
*[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\. [a-zA-Z]{2,}(?:\. [a-z 
A-Z]{2,}) ?$ 

(Email Validation Allowing for Subdomains) 
*[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\. [a-zA-Z] {1,}$ 

(Email Validation Allowing Single-Character Domain Name) 
*[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\. [a-zA-Z]{2,}(7:\. [\pf 
L}\p{N}] {2,})?7$ 

(Email Validation Allowing Internationalized Domain Names (IDNs)) 
-(?:"[\w\s]+") ?( [a-zA-Z0-9._%+-]+)@[a-zA-Z0-9.-]+\. [a-zA 
-Z] {2,}$ 

(Email Validation with Optional Quoted Local Part) 
“(?:\CE° O1*\) | [\w\s] +)? ([a-zA-Z0-9 ._%+-]+) @[a-zA-Z0-9.- 
]+\. [a-zA-Z] {2,}$ vulnerable 

(Email Validation with Optional Comments) 
*[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\. [a-zA-Z] {2,4}$i 

(Email Validation Allowing for Case-Insensitive Domain) 
(rs (? 2 [a-zA-Z0=-9. t=] F0ra-zA= z099. =A [a=zA-Z1{2.,3) | ¢ 
[a-zA-Z0-9._%+-]+)\+ [7 @]+@[a-zA-Z0-9.-]+\. [a-zA-Z]{2,})$ vulnerable 

(Email Validation with Support for Subaddressing) 


matching of the inner regex r succeeds. Note that the position for the overall 
matching does not change by the inner matching of r. For example, the regex 
/(?=bc)/ matches the string "abc" from position 1 (i.e., after the first character 
a) without consuming any characters. 

The matching of a negative look-ahead (?!r) succeeds when the inner regex 
r is not matched. 

Positive or negative look-behind—denoted by (?<=r) or (?<!r), respectively— 
is similar to the above, with the difference that the inner matching of r is per- 
formed backward, i.e., from right to left. For example, the regex /(?<=ab)/ 
matches the string "abc" from position 2 (i.e., before the last character c) with- 
out consuming any characters. 

A typical use of look-around is to put a look-behind before (or a look-ahead 
after) a regex r. This is useful when one wants to perform a search or replacement 
of r for only those occurrences that are in a certain context. For example, the 
regex / (?<=<p>) [*<]*(?=<\/p>) / matches only contents of the HTML <p> tag. 
As another example, common assertions such as \A (this matches the beginning 
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of a string) and \z (this matches the end) can be expressed using look-around, 
namely \A = (?<!.) and \z = (?!.). 

Atomic Grouping Atomic grouping is a regex extension that controls back- 
tracking behaviors. It is designed to manually avoid problems caused by back- 
tracking, such as catastrophic backtracking (§2.4). 

Atomic grouping is represented by the syntax (?>r); once the matching of 
the inner regex r succeeds, the remaining branches in potential backtracking for 
matching r are discarded. For example, the regex /(alab)c/ matches the string 
"abc", but the regex /(?>(alab))c/ using atomic grouping does not match it. 
This is because, once a in the atomic grouping matches the first character a of 
"abc", the remaining branch ab (in alab) is discarded, and one is left with the 
regex c and the string "bc". 

Atomic grouping is often used for the purpose of preventing catastrophic 
backtracking. In that case, it is used in combination with the repetition syntax, 
e.g., (?>(r*)) (often abbreviated as r++) and (?>(r+)) (abbrev. as r++). These 
abbreviations are called possessive quantifiers. The former (namely (?>(r*))) 
is intuitively understood as (?>(elrirrl...)), with the difference that longer 
matching is preferred (this is because the Eps loop is the first successor in Fig- 
ure le). Once a longer match is found, the remaining branches (i.e., those for 
shorter matches) get discarded, thus preventing catastrophic backtracking. 

One might wonder if our (linear-time and thus ReDoS-free) matching algo- 
rithm should support atomic grouping—the principal use of atomic grouping 
is to suppress backtracking and avoid ReDoS. We do need to support it since, 
as we discussed in §1, ours is meant to be a drop-in replacement for matching 
implementations that are currently used. 

Our Target Extended Regexes Our target class, namely regeres with look- 
around and atomic grouping, is defined by the following grammar. 


Pa (the same as the regexes definition, §2.1) 
| (?=r) | (2? !r) (positive and negative look-ahead) 
| (?<=r) | (?<!r) (positive and negative look-behind) 
| (?>r) (atomic grouping) 


For brevity, we sometimes refer to regexes with look-around and atomic grouping 
as (la, at)-regexes. We also refer to regexes with look-around as la-regexes and 
regexes with atomic grouping as at-regezes. 

For a (la, at)-regex r, the size of r, denoted by |r|, is defined as the same as 
the regex one except for |(?=r)| = |(?>r)| = |r| +1. 

Look-around is known to be regular: they can be converted to DFA, and the 
language family of la-regexes is the same as the regular language. This fact is 
mentioned in [3,26,27]. Atomic grouping is also known to be regular in the same 
sense [4]. However, it is known that look-ahead and atomic grouping can make 
the number of states of the corresponding DFA grow exponentially [3, 4, 26, 27]. 

In what follows, for simplicity, we only discuss positive look-ahead in dis- 
cussions of look-around. Adaptation to other look-around operators, such as 
negative look-behind, is straightforward. 
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2.6 NFAs with Sub-automata 


We introduce NFAs with sub-automata for backtracking matching algorithms for 
(la, at)-regexes. This extended notion of NFAs is suggested in [10, Section IX.B], 
but it seems ours is the first formal exposition. 

Roughly speaking, an NFA with sub-automata is an NFA whose transitions 
can be labeled with—in addition to a character ø € X, as in usual NFAs— 
another NFA with sub-automata. See Figure 3, where transitions from qo to qı 
are labeled with |r|, the NFA with sub-automata obtained by converting r. We 
annotate these transitions further with a label (pla for positive look-ahead, at 
for atomic grouping, etc.) that indicates which operator they arise from. Note 
that NFAs with sub-automata can be nested—transitions in |r | in Figure 3 can 
be labeled with NFAs with sub-automata, too. 

Our precise definition is as follows. There, P is the set that collects all states 
that occur in an NFA with sub-automata A, i.e., in 1) the top-level NFA, 2) its 
label NFAs, 3) their label NFAs, and so on. 


Definition 1 (NFAs with sub-automata). An NFA with sub-automata A 
is a quintuple A = (P, Q, qo, F, T) where P is a finite set of states and Q C P is a 
set of so-called top-level states. We require that the quadruple (Q, qo, F,T) is an 
NFA, except that the value T(q) of the transition function T is either 1) Eps(q’), 
Branch(q’, q”), or Char(c, gq’) (as in usual NFAs, §2.2), or 2) Sub(k, A’, q’), where 
A’ is an NFA with sub-automata, q' is a successor state, and k is a kind label 
where k € {pla, nla, plb, nlb, at}. 

We further impose the following requirements. Firstly, we require all NFAs 
with sub-automata in A to have disjoint state spaces. That is, for any dis- 
tinct top-level states q,q"” E€ Q in A, if T(q) = Sub(k,A’,q’) and T(q") = 
Sub(k’, A”, q’"), then we must have P' N P” = 0, QA P' = 0 and QAO P” =9, 
where A’ = (P’,...) and A” = (P",...). Secondly, we require that the set P 
in A=(P,...) is the (disjoint) union of all states that occur within A, that is, 


P = QU jeg, 7(q)=Sub(k,A’,q"),A'=(P",...) P 


The kind label k in Sub(k, A’, q”) indicates how the sub-automaton A’ should 
be used (cf. Algorithm 2). If every kind label occurring in A (including its sub- 
automata) is either pla, nla, plb, or nlb, then A is called a la-NFA. Similarly, if 
every kind label is at, A is called an at-NFA. Following this convention, general 
NFAs with sub-automata are called (la, at)-NFAs. 

Note that the definition is recursive. Non-well-founded nesting is prohibited, 
however, by the finiteness of P. By the definition, if P = Q, then A does not 
contain any transitions labeled with sub-automata. 

In addition to Eps and Branch transitions, we refer to Sub transitions with 
a label k € {pla, nla, plb, nlb} as e-transitions too. We also assume the following, 
similarly to Asm. 1. 


Assumption 2. (la, at)-NFAs do not contain ¢-loops. 
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Sub(pla,} r |) Sub(at,] r l) 
1o 


(a) (?=r) (positive look-ahead) (b) (?>r) (atomic grouping) 


Fig. 3: a conversion from (la, at)-regexes to (la, at)-NFAs. For negative look- 
ahead, we use the corresponding kind label nla. For positive and negative look- 
behind, besides using the kind labels plb and nlb, we suitably reverse | r |. 


Algorithm 2 a partial backtracking matching algorithm for (la, at)-NFAs 


1: function MATCH-(la, at) 4 „(4, i) 
Parameters: a (la, at)-NFA A, and an input string w 
Input: a current state q, and a current position i 
Output: returns SuccessAt(i’) if the matching succeeds, or 
returns Failure if the matching fails 
2: (P,Q, qo, F, T) = A 
3: if q € F then 


12: else if T(q) = Sub(pla, A’, q’) then 


> the same as Algorithm 1 


> positive look-ahead; other look-around ops. are similar 
13: (P',Q', qo, F', T") =A’ 
14: result <- MATCH-(la, at) 4, „(qo i) 
15: if result = SuccessAt(i’) then return MATCH-(la, at) 4 u(i) 
16: | else return r 
17: else if T(q) = Sub(at, A’, q’) then > atomic grouping 
18: (P',Q', q, F', T") =A’ 
19: result +— MATCH-(la, at) 4 „(q0 i) 
20: if result = SuccessAt(i') then return MATCH-(la, at) 4 ,,(q',2’) 
21: else return r 


For (la, at)-regexes, their conversion to (la, at)-NFAs is described by the 
constructions in Figure 3—using transitions labeled with sub-automata—in ad- 
dition to the conversion for regexes in §2.2. Note that we have |P| = O(|r|) in 
these constructions. 

The backtracking matching algorithm in Algorithm 1 can be naturally ex- 
tended to (la, at)-NFAs; it is shown in Algorithm 2. The clauses for positive 
look-ahead (Lines 12 to 16) and atomic grouping (Lines 17 to 21) are similar to 
each other, conducting matching for sub-automata. Note that their difference is 
in the “return position” (i in Line 15; 7’ in Line 20). 

The clauses for other look-around operators are similar to the ones for posi- 
tive look-around. For look-behind, we can suitably use an additional parameter 
d€ {—1,+1} for indicating a matching direction. 

Using the extended backtracking matching algorithm (Algorithm 2), we de- 
fine the partial matching problem for (la, at)-regexes in the same way as for 
regexes without extensions (Prob. 1). 
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Problem 2 ((la, at)-regex partial matching). 
Input: a (la, at)-regex r, an input string w, and a starting position i 
Output: returns MATCH-(la, at) 4(,) (Go, i) where A(r) = (P,Q, qo, F,T). 


3 Previous Works on Regex Matching with Memoization 


This section introduces an existing work [10] on regex matching with memoiza- 
tion, paving the way for our algorithms for (la, at)-regexes in Sections 4 and 5. 

Memoization is a programming technique that makes recursive computations 
more efficient by 1) recording arguments of a function and the corresponding 
return values and 2) reusing them when the function is called with the recorded 
arguments. 

As we described in §2.3, regex matching is conducted by backtracking match- 
ing. It is implemented by recursive functions (see Algorithms 1 and 2); thus, it is 
a natural idea to apply memoization. Since Java 14, Java’s regex implementation 
has indeed used memoization for optimization. However, this optimization is not 
enough to completely prevent ReDoS; see, e.g., [24]. 

The work that inspires the current work the most is [10], whose main novelty 
is linear-time backtracking regex matching (much like the current work). Its 
contributions are as follows. 


1. Focusing on (non-extended) regexes (see §2.1), they introduce a backtracking 
matching algorithm that uses memoization. It achieves a linear-time com- 
plexity: for an input string w, its runtime is O(|w)). 

2. They introduce selective memoization, by which they reduce the domain of 
the memoization table from Q x N to Qse x N. Here Qse is a subset of Q 
that is often much smaller. 

3. They introduce a memory-efficient compression method—based on run-length 
encoding (RLE)—for memoization tables. 

4. Finally, they discuss adaptations of the above method to extended regexes, 
namely REWZWA (the extension by look-around; look-around is called zero- 
width assertion in [10]) and REWBR (the extension by back-reference). 


We will mainly discuss the above item 1; it serves as a basis for our algorithms 
in Sections 4 and 5. The technique in the item 2 is potentially very relevant: we 
expect that it can be combined with the current work; doing so is future work. 
The content of the item 2 is reviewed in [15, Appendix A] for the record. 


Remark 2. On the above item 4, the work [10] claims that the time complexity 
of their algorithm is linear also for REWZWA (O(|w|) for an input string w). 
However, we believe that this claim comes with the following problems. 


— The description of an algorithm for REZWA in [10] is abstract and leaves 
room for interpretation. The description is to “preserve the memo functions 
of the sub-automata throughout the simulation of the top-level M-NFA, re- 
membering the results from sub-simulations that begin at different indices 7 
of w” [10, Section IX-B]. For example, it is not explicit what the “results” 
are—they can mean (complete) matching results or mere success/failure. 
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Algorithm 3 a total matching algorithm with memoization for NFAs without 
e-transitions [10, Listing 2]. 


1: function DavisSL“%.,, (g, i) 
Parameters: an NFA A without ¢-transitions, an input string w, and 
a memoization table M: Q x N — {false} 
Input: a current state q, and a current position 7 
Output: returns true if the matching succeeds, or 
returns false if the matching fails 
(Q, qo, F, 6) =A 
if i = |w| then return whether q € F 
if M(q,i) 4 L then return M(q,i) 
for q' € 6(q, wļi]) do 
| if DavisSLY,,(q',i + 1) then return true 
M(q,i) + false 
return false 


— Moreover, the part “that begin at different indices i of w” is problematic; 
we believe that remembering these results does not lead to linear-time com- 
plexity. This point is discussed later in Remark 4. 

— Besides, there is a gap between the algorithm described in the paper [10] 
and its prototype implementation [11], even for (non-extended) regexes. See 
Remark 3. 

— Because of this gap, the implementation [11] works in linear time for all 
regexes, including REZWA, but can lead to erroneous results for REZWA. 
See Remark 4. 


Our contribution includes a correct memoization algorithm for look-around (REZWA) 
that resolves the above problems. 


3.1 Linear-time Backtracking Matching with Memoization 


We describe the first main contribution of the work [10] (the item 1 in the above 
list), namely a backtracking algorithm that achieves a linear-time complexity 
thanks to memoization. The algorithm [10, Listing 2] is presented in Algorithm 3. 
In this algorithm DavisSLY w, an NFA A is a quin- 
tuple (Q, qo, F,ô) where 6:Q x X + 2 is an non- 
deterministic transition function. An additional parame- 
ter M: Q x N — {false} is a memoization table, which 
is mathematically a mutable partial function. This algo- 
rithm implements total matching (cf. §2.3). It is notable 
that the memoization table records only matching failures: 
a matching success does not have to be recorded since it Fig.4: the NFA 
immediately propagates to the success of the whole prob- A((aaļaa)*b), 
lem. after removing 
-transitions 
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Algorithm 4 a variant of Algorithm 3 implemented in their prototype [11] 


1: function DAvIsSLIMPL¥ „ (q, i) 

2: (Q,q0,F,6) =A 

3: if i = |w| then return whether q € F 

4: if M(q,i) 4 L then return M(q, 7) 

M(q,1) + false > M(q,i) is speculatively set to false 
5: for q' € 6(q, wļi]) do 


T: Migi false > moved up 


This algorithm achieves a linear-time matching. It thus prevents ReDoS. A 
full proof of linear-time complexity is found in [10, Appendix C], but its essence 
is the following (note the critical role of memoization here). 


— For any call DavisSLi.,, (q, i), if M(q,i) is defined, then the call does not 
invoke any further recursive calls. 

— When such a call returns false, the entry M(q,i) of the memoization gets 
defined (Line 7). 

— Asa consequence, the number of recursive calls of DavisSLY w is limited to 


|Q| x Jw]. 


Example 2 (matching with memoization for NFAs without <-transitions). Let us 
consider the regex (aalaa)*b and the corresponding NFA A((aalaa)*b) defined 
in §2.2. For the purpose of applying Algorithm 3, we manually remove its €- 
transitions, leading to the NFA in Figure 4. Let w = "a?"c" be an input string. 
MATCH 4,w(Go,0) (without memoization) invokes recursive calls O(2") times for 
the same reason as in Example 1, but DavisSL°, (go,0) (with memoization, 
where Mp is the initial memoization table) invokes recursive calls O(n) times 
because M (qo, i) for each position i € {0,2,...,2n} has been recorded after the 
first visit. 


Remark 3. Following the discussion in Remark 2, here we describe the gap be- 
tween Algorithm 3—the algorithm described in the paper [10]—and its prototype 
implementation [11]. The latter is shown in Algorithm 4. 

The precise difference between the two algorithms is that Line 7 in Algo- 
rithm 3 is moved up to the moment just before the for-loop, in Algorithm 4. 
It is not hard to see that this modification does not affect the correctness of 
the algorithm: if the pair (q,7) is visited again in the future, it means that the 
current matching from (q,i) did not succeed, and backtracking occurred. Note 
that, in case the current matching is successful, the function call returns true 
so the memoization content M (q, i) should not matter. 

However, the above argument is true only when there is no look-around. 
(A detailed discussion is in Example 3.) This point seems to be missed in the 
implementation [11]. 
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Algorithm 5 a partial matching algorithm with memoization. An adaptation 
of Algorithm 3 from [10], and a basis for our algorithms (Algorithms 6 and 7) 


1: function Memo“, (q, 7) 
Parameters: an NFA A, an input string w, and 
a memoization table M: Q x N — {Failure} 
Input: a current state q, and a current position i 
Output: returns SuccessAt(i’) if the matching succeeds, or 
returns Failure if the matching fails 


(Q, 90, F,T) =A 
if M(q,i) 4 L then return M(q, 7) 
result <— L 


if q € F then result + SuccessAt(?) 

else if T(q) = Eps(q’) then result ~ MEMO¥ „(q', i) 

else if T(q) = Branch(q’,q”) then 

result ~ Memo.,,,(q’, i) 

if result = Failure then result ~ MEMO% „(q”, i) 

else if T(q) = Char(o,q’) then 

if i < |w| and wii] = o then result + Memo, (q/,i + 1) 

else result + Failure 

if result = Failure then M (q, i) < Failure 

return result > result Æ L, as one can easily see 


PRR rRH 
PUN oLD AWN 


3.2 Matching with Memoization Adapted to the Current Formalism 


In Algorithm 5, we present an adaptation of Algorithm 3 to our formalism, 
especially our definition of NFA (§2.2) that offers fine-grained handling of non- 
determinism. Algorithm 5 has been adapted also to solve partial matching (it 
returns a matching position 7’) rather than total matching as in Algorithm 3 (cf. 
§2.3). Algorithm 5 serves as a basis towards our extensions to look-around and 
atomic grouping in Sections 4 to 5. 

The adaptation is straightforward: Line 5 ensures that the algorithm solves 
partial matching; the rest is a natural adaptation of the for-loop of Algorithm 3 
to our definition of NFA (§2.2). The algorithm terminates thanks to Asm. 1. We 
note that the type of memoization tables does not have to be changed compared 
to Algorithm 3. 

Algorithm 5 exhibits the same desired properties as Algorithm 3, namely 
correctness (with respect to Prob. 1) and linear-time complexity. We formally 
state these properties for the record; here, Mo: Q x N — {Failure} is the initial 
memoization table (its entry is anywhere L). 

Theorem 1 (linear-time complexity of Algorithm 5). For an NFA A = 
(Q,qo, F,T), an input string w, and an position i € {0,...,|w|}, Memo’, (qo, i) 
terminates with O(|w|) recursive calls. 

Theorem 2 (correctness of Algorithm 5). For an NFA A= (Q, qo, F, T), 
an input string w, and an position i € {0,...,|wl}, MATCHA w(qo,i) = 
MEMO% (qo, i). 


The proofs can be found in [15, Appendix B.1]. Here is their outline. 
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Eps 


Sub(pla, A’) har(a 


Fig. 6: the at-NFA A(a*(?>a*)ab) 


We first introduce the notion of run for MATCH and MEMO; it records recur- 
sive calls of the function itself, as well as invocations of the memoization table, 
together with their return values. 

For linear time complexity (Thm. 1), we show that 1) a recursive call with 
the same argument (q, i) appears at most once in a run, and that 2) the number 
of invocations of the memoization table with the same key (q, i) is bounded by 
the (graph-theoretic) in-degree. Linear-time complexity then follows easily. 

For correctness (Thm. 2), we introduce a conversion from runs of MEMO to 
runs of MATCH. By showing that 1) the result is indeed a valid run of MATCH 
and 2) the conversion preserves return values, we show the coincidence of the 
return values of the two algorithms, i.e., correctness. 


4 Memoization for Regexes with Look-around 


We describe our first main technical contribution, namely a backtracking match- 
ing algorithm for la-NFAs with memoization (Algorithm 6). We prove that it is 
correct (Thm. 4) and that its time complexity is linear (O(|w|), Thm. 3). 

The key ingredient of our algorithm is the type of memoization tables, where 
their range is extended from {Failure} to {Failure, Success}. We motivate this ex- 
tension through two problematic algorithms MEMOEXIT-la and MEMOENTER-la; 
MEMOEXIT-la is obtained by naively extending Algorithm 5 (MEMO) with adding 
the processing of sub-automaton transitions with pla (positive look-ahead) done 
in Algorithm 2 (Lines 12 to 16), and MEMOENTER-Ia is similar to MEMOEXIT-la, 
but this records to the memoization table at the same timing as Algorithm 4 
(DavisSLImpt). In particular, their memoization tables only record false. 

The example below shows the problems with the two naive algorithms. Specif- 
ically, MEMOEXIT-la is not linear and MEMOENTER-la is not correct. 


Example 3. Consider the la-NFA A = A(((?=a*)a)*) = (P,Q, qo, F,T) shown in 
Figure 5, and let w = "a”" be an input string. 


Efficient Matching with Memoization for (la, at)-regexes 107 


MEMOEXIT-la %9, (qo, 0) invokes recursive calls O(|w|?) times—in the same 
way as MATCH-(la, at)—because there are no matching failures in A’ that con- 
tribute to memoization. 

We also see MEMOENTER-la is not correct: MATCH-(la, at) 4 ,,(Go,0) re- 
turns SuccessAt(n), but MEMOENTER-la %9, (qo, 0) returns SuccessAt(1) because 
M(qs,1) = false is recorded during the first loop and interpreted as a matching 
failure. 


In Example 3, a natural solution to the non-linearity issues with MEMOEXIT-la 
is to enrich memoization so that it also records previous successes of look-around. 
Furthermore, since matching positions do not matter in look-around, the type 
of memoization tables should be M: P x N — {Failure, Success}. 


Remark 4. The work [10, Section IX-B] proposes an adaptation of their memo- 
ization algorithm to REZWA. Its description in [10, Section IX-B] (to “preserve 
the memo functions. ..”; see Remark 2) consists of the following two points: 


1. preserving the memoization tables of the sub-automata throughout the whole 
matching, and 

2. recording the results of sub-automata matching from different start positions 
i of w. 


The naive algorithm MEMOEXIT-la we discussed above implements the first 
point. We can further add the second point (that is essentially “memoization for 
sub-automaton matching”) to MEMOEXIT-la. 

However, we find that this is not enough to ensure linear-time complexity. 
The problem is that the “memoization for sub-automaton matching” is used too 
infrequently. For example, in Example 3, the start positions of sub-automaton 
matching are different each time; thus, the memoized results are never used. 

Our algorithm (Algorithm 6) resolves this problem by letting the memoiza- 
tion tables (for sub-automaton matching) record results not only for starting 
positions but also for non-starting positions. 

We also note that there is a gap between the algorithm in the paper [10] and 
its prototype implementation [11]; see Remark 3. The latter is linear time but 
not always correct. For example, in Example 3, the correct result is SuccessAt(n), 
but the prototype [11] returns SuccessAt(1), similarly to MEMOENTER-la. 


Algorithm 6 is the matching algorithm for la-NFAs that we propose. It adopts 
the above extended type of M. In Line 18, Success is recorded in the mem- 
oization table when the matching succeeded. This function can return one of 
SuccessAt(i’), Failure, and Success. We first prove the following lemma to see 
that the algorithm indeed solves the partial matching problem (Prob. 2). 
Lemma 1. For a la-NFA A = (P,Q,qo,F,T), an input string w, and a 
position i € {0,...,|wl}, Memo-la%?, (qo, i) returns either SuccessAt(i') for 
t € {0,...,]w|} or Failure (it does not return Success). 

Proof. When we obtain Success as a return value, it must be via an entry M (q, i) 
of the memoization table. However, due to Asm. 2, when M(q, i) is set to Success 
for a state q of the top-level automaton of A, the matching is already finished 
and returns SuccessAt(7’). 
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Algorithm 6 our partial matching algorithm with memoization for la-NFAs 


1: function MEMO-la% „ (q, i) 
Parameters: a la-NFA A, an input string w, and 
a memoization table M: P x N — {Failure, Success} 
Input: a current state q, and a current position i 
Output: returns SuccessAt(i’) if the matching succeeds, 
returns Success if a matching success is in M (cf. Lemma 1), or 
returns Failure if the matching fails 


2: (P, Q, go, F,T) = A 

3: if M(q,i) 4 L then return M(q, i) 
4: result <— L 

5: 


if q € F then > the same as Lines 5 to 12 of Algorithm 5 


13: else if T(q) = Sub(pla, A’, q’) then 


14: (P’,Q',0', F’,T’) =A’ 

15: result + MEMO-la% „(qo', i) 

16: if result = SuccessAt(i’) or Success then 

Li: | result + MEMO-la¥% w (q', 4) 

18: if result = SuccessAt(i’) or Success then M (q, i) + Success 
19: else if result = Failure then M (q, i) < Failure 

20: | return result 


As a consequence of the lemma, we can further shrink the memoization tables 
in Algorithm 6 by not recording Success for M (q, i) where q is a state of the top- 
level automaton. 

Algorithm 6 exhibits the desired properties, namely correctness (with respect 
to Prob. 2) and linear-time complexity. 

Theorem 3 (linear-time complexity of Algorithm 6). For a la-NFA 
A = (P,Q,q,F,T), an input string w, and a position i € {0,...,|wl}, 
MrEMo-la 4%., (qo; i) terminates with O(|w|) recursive calls. 

Theorem 4 (correctness of Algorithm 6). For a la-NFA A = 
(P,Q,q0,F,T), an input string w, and a position i € {0,...,|wh}, 
MarTcH-(la, at)(qo, i) = Memo-la®, (qo, i). 

Thm. 3 and 4 can be shown similarly to Thm. 1 and 2; see [15, Appendix B.2]. 
The in-degree for sub-automata requires some additional care. 


5 Memoization for Regexes with Atomic Grouping 


We describe our second main technical contribution, namely a backtracking 
matching algorithm for at-NFAs with memoization (Algorithm 7). We prove that 
it is correct (Thm. 6) and that its time complexity is linear (O(|w|), Thm. 5). 
The key ingredient of our algorithm is the type of memoization tables, where 
their range is extended from {Failure} to {Failure(j) | j € {0,...,v(Ao)}}; the 
latter records a depth j of atomic grouping in order to distinguish failures of dif- 
ferent depths. We motivate this extension through two problematic algorithms 
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MEMOEXIT-at and MEMOENTER-at. Much like in §4, MEMOEXIT-at naively 
extends Algorithm 5 (MEMO) by adding the processing of sub-automaton tran- 
sitions with at done in Algorithm 2 (Lines 17 to 21), and MEMOENTER-Ia is 
similar to MEMOEXIT-at, but records to the memoization table at the same 
timing as Algorithm 4 (DAvISSLIMPL). 

Firstly, we observe that MEMOEXIT-at is not linear for a reason similar to 
Example 3. (A concrete example is given by Example 4.) Therefore, we turn to 
the other candidate, namely MEMOENTER-at. 

We find, however, that MEMOENTER-at is also problematic. It is not correct. 


Example 4. Consider the at-NFA A = A(a*(?>a*)ab) = (P,Q, qo, F,T) shown in 
Figure 6, and let w = "a"b" be an input string. MATCH-(la, at) 4 ,,,(qo, 0) returns 
Failure—the atomic grouping (?>a*) consumes all a’s in w and no a is left for 
the final ab pattern—but MEMOENTER-at4’°, (qo, 0) returns SuccessAt(n + 1). 
Thus MEMOENTER-at is wrong. 

For both algorithms, the state q7 in the at transition is first reached at posi- 
tion i = n, and then backtracking is conducted, leading to the state q7 again at 
i = n — 1. The execution of MEMOENTER-at proceeds as follows. 


— The first execution path consumes all a’s in the loop from qo to q2, reaches 
q7 with i = n, eventually leading to failure at q4 and thus to backtracking. 
Speculative memoization (M(q,i) < false in Algorithm 4) is conducted in 
its course; in particular, M(q7,n) = false is recorded. 

— After backtracking, the second execution path reaches q7 with i = n — 1; it 
then visits gg once and reaches q7 with i = n. Now it uses the memoized 
value M(q7,n) = false (cf. Line 4 of Algorithm 4), leading to backtracking 
to q7 with i = n— 1. It then takes the branch to qig, and the matching for A’ 
succeeds. Therefore, the execution reaches q4 with i = n — 1, and the whole 
matching succeeds. 


The last example shows the challenge we are facing, namely the need of 
distinguishing failures of different depths. Specifically, in the previous example, 
the memoized value M(q7,n) = false comes from the failure of matching for 
ambient A; still, it is used to control backtracking in the sub-automaton A’. This 
fact is problematic in an atomic grouping where, roughly speaking, backtracking 
in an ambient automaton should not cause backtracking in a sub-automaton. 
Atomic grouping can be nested, so we must track at which depth failure has 
happened. 


Definition 2 (nesting depth of atomic grouping). For an at-NFA A = 
(P,Q,q0,F,T) and a state q € P, the nesting depth of atomic grouping for q, 
denoted by v.4(q), is 


0 fqEeQ 


va(q) = 41l+ua(q) where A’ = (P',Q', qo, F", T") 
s.t. T(q') = Sub(at, A’, q”) and q € P’. 
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Algorithm 7 our partial matching algorithm with memoization for at-NFAs 


1: function MEMO-at%, 4.w(q 2) 
Parameters: an at-NFA Ao, a sub-automaton A of Apo (it can be Ao itself), 
an input string w, and a memoization table M: P x N — 
{Failure(j) | j € {0,...,u(Ao)}} 
Input: a current state q, and a current position i 
Output: returns SuccessAt(i’, K) if the matching succeeds, or 

returns Failure(7) if the matching fails 

2 (P,Q, qo, F, T) = A 

3 if M(q,i) # L then return M (q, i) 

4: result + L 

5: if q € F then result + Success(i, Ø) 

6: else if T(q) = Eps(q’) then result + MEMO-at%, A,w(q', i) 

7 else if T(q) = Branch(q', q”) then 

8 


: result — MEMO-at% A w(i) 
9: if result = Failure(j) and j = vao (q) then 
10: | result ~ MEMO-at% 4 w(q", i) 
11: lL if result = Failure(j’) then result + Failure(min(j, 7’)) 
12: else if T(q) = Char(o,q’) then 
13: if i < |w| and wļi] = ø then result ~ Memo-at 4. (q',i+ 1) 
14: | else result + Failure(v4, (q)) 
15: else if T(q) = Sub(at, A’, q’) then 
16: P',Q',¢0,F',T' ) =A’ 
17: result < MEMO-at%, a’ w (90> 4) 
18: if result = SuccessAt(i’, K’) then 
19: result + MEMO-at% Aw (g, 7) 
20: if result = SuccessAt(i”, K”) then result + SuccessAt(i”, K’ U K”) 
21: else if result = Failure(j) then 
22: | | for k € K’ do M(k) + Failure(j) 
23: else if result = Failure(j7) and j > v.4,(q) then result + Failure(v.4,(¢q)) 


24: if result = SuccessAt(i’, K) then result + SuccessAt(i’, K U {(q, i) }) 
25: else if result = Failure(j) then M(q,i) < Failure(j) 
26: | return result 


We also define the maximum nesting depth of atomic grouping for A, denoted 
by v( A), as v( A) = maxqe p valq). 


Algorithm 7 is our algorithm for at-NFAs; the type of its memoization tables 
is M: P x N = {Failure(j) | j € {0,...,v(A)}}. Some remarks are in order. 

Note first that the algorithm takes, as its parameters, the whole at-NFA Ao 
and its sub-automaton A as the algorithm’s current scope. The top-level call is 
made with Ao = A (cf. Thm. 5 and 6); when an at transition is encountered, 
the scope goes to the corresponding sub-automaton (A’ in Line 17). 

In Line 9, the if condition checks that the nesting depth of Failure is the 
depth of the current NFA, and backtracking is performed if and only if it is true. 
This approach is crucial for avoiding the error in Example 4. The rest of the 
cases for Eps, Branch, Char is similar to Algorithm 5. 
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The case for Sub (Lines 15-23) requires some explanation. It is an adaptation 
of Lines 17-21 of Algorithm 5 with memoization. The apparent complication 
comes from the set K in SuccessAt(i’, K). The set K is a set of keys for a 
memoization table M, that is, pairs (q,i) of a state and a position. The role of 
K is to collect the set of keys of M for which, once failure happens, the entry 
Failure(j) has to be recorded (this is done in a batch manner in Line 22). More 
specifically, once failure happens in an outer automaton (i.e., at a smaller depth 
j), this has to be recorded as Failure(7) for inner automata (at greater depths). 
The set K collects those keys for which this has to be done, starting from inner 
automata (A’, Line 18) and going to outer ones (A, Lines 19-20). 

A closer inspection reveals that Line 20 is vacuous in Algorithm 7; however, 
it is needed when we combine it with look-around at the end of the section. 

Algorithm 7 exhibits the desired properties, namely correctness (with respect 
to Prob. 2) and linear-time complexity. In Thm. 6, f is a function that converts 
results of Algorithm 7 to results of Algorithm 2; it is defined by f(Failure(j)) = 
Failure and f(SuccessAt(2’, K)) = SuccessAt(?’). 

Theorem 5 (linear-time complexity of Algorithm 7). For an at-NFA 
A = (P,Q,,F,T), an input string w, and an position i € {0,...,|wl}, 
MEMO-at 4°, (dos) terminates with O(|w|) recursive calls. 

Theorem 6 (correctness of Algorithm 7). For an at-NFA A = 
(P,Q,q0,F,T), an input string w, and an position i € {0,...,|wl}, 
MaTcH-(la, at) (qo, i) = f(MEmo-at% (qo, i). 


Thm. 5 and 6 are proved similarly to Thm. 1 and 2; see [15, Appendix B.3]. The 
following points require some extra care. 

Firstly, for linear-time complexity (Thm. 5), there is another recursive call 
(Line 19) before the return value of a recursive call (Line 17) is memoized 
(Line 22). If the second recursive call (Line 19) eventually leads to (the same 
call as) the first call (Line 17) (let’s call this event (*)), then this can nullify the 
effect of memoization. We prove, as a lemma, that (*) never happens. 

Secondly, for correctness (Thm. 6), our conversion of runs should replace 
an invocation of the memoization table—if it returns a failure with a shallower 
depth—with not only the corresponding run (as before) but also the run of the 
second recursive call (Line 19). See [15, Appendix B.3] for details. 
Combination with Look-around It is also possible to combine with Al- 
gorithm 6 (for look-around) and Algorithm 7 (for atomic grouping). In this 
case, the type of memoization tables becomes M: P x N — {Failure(j) | j € 
{0,...,v(A)}} U {Success} and nesting depths of the atomic group are reset by 
look-around operators. A complete algorithm can be found in [15, Appendix C]; 
it also exhibits the desired properties. 


6 Experiments and Evaluation 


Implementation We implemented the algorithm proposed in this paper for 
evaluation. We call our implementation memo-regex. It is written in 1368 lines 
of Scala. 
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Table 2: our benchmark regexes and input strings 

/> (2=7 .{1,254}$) C (7: (?7!\. [-) ([a-z0-9\-\*] {1,63} | (La-z0-9\-] {1,62} [a- 
z0-9]))\.)+(?: [a-z] {2,})$)$/s 

input: wy = "0." "0.0a."" "\u0000", complexity: O(2”) 

https: //regexlib.com/REDetails.aspx?regexp_id=3494 
J CPC? C\IV DN TD NDE COEDN IT AND) 
T2 input: w2 = "g"” W", complexity: O(n?) 

https://regexlib.com/REDetails.aspx?regexp_id=938 
/(2<=[\w\s] (7: [\.\!1\? J+[\x20] * [\x22\xBB] *) ) (? : \s+(? ! [\x22\xBB] (? !\w) 
))/ 


input: wg ="\""" " ", complexity: O(n?) 


T3 
https://regexlib.com/REDetails.aspx?regexp_id=2355 


/(?: (<) \s*? (\wt) (\s*? (7> (? !=[\/\2]?>) (\wt) (? :\s* (=) \s*) ((7:\ E]! 
INV DOA" DN" EDT >]+))))\s*? ([\/\7]?>))/ 


input: wa = "<" "aaa"” ">", complexity: O(n?) 


https ://regexlib.com/REDetails.aspx?regexp_id=373 


memo-regex supports both look-around (i.e., look-ahead and look-behind) 
and atomic grouping. We implemented a regex parser ourselves. Backtracking is 
implemented by managing a stack manually rather than using a recursive func- 
tion to prevent stack overflow. In this case, the memoization keys are pushed 
onto the stack. Recoding these keys in a memoization table is done during back- 
tracking. We used the mutable HashMap from the Scala standard library as a 
data structure for memoization tables. 

memo-regex also supports capturing sub-matchings. However, this feature 
cannot be used within atomic grouping and positive look-around because sub- 
matching information is lost for memoization. 

The code of memo-regex, as well as all experiment scripts, is available [14]. 


Efficiency of Our Algorithm We conducted experiments to assess the per- 
formance of our memo-regex, in particular in comparison with other existing 
implementations. 

As target regexes, we looked for those with look-around and/or atomic group- 
ing in the real-world regexes posted on regexlib.com. We then identified—by 
manual inspection—four regexes r1,...,74 that are subject to potential catas- 
trophic backtracking. These regexes are shown in Table 2. We then crafted input 
strings w1,...,W4, respectively, so that they cause catastrophic backtracking. 
Specifically, r4 contains positive look-ahead and negative look-ahead; this posi- 
tive look-ahead is used for restricting the length of input strings. The regexes r2 
and r3 are themselves positive look-ahead and look-behind, respectively; both 
include negative look-ahead, too. The regex r4 includes atomic grouping and 
negative look-ahead. 

For these regexes, we measured matching time using memo-regex on Open- 
JDK 21.0.1. We compared it with the following implementations: Node.js 20.5.0, 
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Fig. 8: matching time for r2,r3 and r4 


Ruby 3.1.4, and PCRE2 10.42 (used by PHP 8.3.1, w/ or w/o JIT). All of these 
implementations use backtracking; Ruby and PCRE2 have restrictions on regexes 
inside look-behind and Node.js does not support atomic grouping. The exper- 
iments were performed 10 times and the average was adopted. Furthermore, 
for memo-regex, we measured the size of its memoization table by the memory 
usage, using jamm. The experiments were conducted on MacBook Pro 2021 
(Apple M1 Pro, RAM: 32 GB). 

We show the results in Figures 7 and 8. Note that the values of n are different 
depending on whether the matching time complexity is O(n?) or O(2"). Results 
for some implementations are absent for r3 and r4 because of the syntactic 
restrictions discussed above. 

In Figures 7 and 8, we observe clear performance advantages of memo-regex. 
In particular, its linear-time complexity and linear memory consumption (mem- 
oization table size) are experimentally confirmed. 

Real-world Usage of Look-around and Atomic Grouping We addition- 
ally surveyed the use of the regex extensions of our interest, in order to confirm 
their practical relevance. 

We used a regex dataset collected by a 2019 survey [9]. This dataset contains 
537,806 regexes collected from the source code of real-world products. 


8 https: //github.com/jbellis/jamm 
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We tallied the usage of each regex extension by parsing these regexes in the 
dataset with our parser in memo-regex. 8,679 regexes could not be parsed or 
compiled; this is due to back-reference for 4,360 regexes, unsupported syntax 
(Unicode character class, conditional branching, etc.) for 4,134 regexes, and too 
large or semantically invalid regexes for the other 184 regexes. We adopted the 
remaining 529,127 regexes for tallying. 


The result is shown in Table 3. Note that Table 3: regex ext. usage 
1) the numbers for look-ahead and look-be- 
hind do not include simple zero-width asser- feature # of regexes 
tions such as (line-begin) or $ (line-end), (total) 529,127 
and 2) that of atomic grouping includes pos- positive look-ahead 7,476 (1.4%) 
sessive quantifiers such as *+ and ++. negative look-ahead 6,917 (1.3%) 
In Table 3, we observe that 17,167 regexes aan vate Leis ro 
P s , negative look-peninqa 4 «l 70 
(3.2%) in the dataset use at least one of the REEE AN 1113 (0.2%) 
extensions we studied in this paper. While at least one of the above 17,167 (3.2%) 
the ratio is not very large, the absolute num- 
ber (17,167 regexes) is significant; this implies that there are a number of ap- 


plications (such as web services) that rely on the regex extensions. Thereby we 
confirm the practical relevance of these regex extensions. 


7 Conclusions and Future Work 


In this paper, we proposed a backtracking algorithm with memoization for 
regexes with look-around and atomic grouping. It is the first linear-time back- 
tracking matching algorithm for such regexes. It also fixs problems of the mem- 
oization matching algorithm in [10] for look-ahead. We implemented the algo- 
rithm; our experimental evaluation confirms its performance advantage. 

One direction of future work is to support more extensions. Our implemen- 
tation does not support a widely used regex extension, namely back-references. 
In the recent work [10], back-reference was supported by additionally record- 
ing captured positions in memoization tables. We expect that a similar idea is 
applicable to our algorithm. 

Combination with selective memoization (used in [10]; see [15, Appendix AJ) 
is another direction. We believe it is possible, but it will require a more detailed 
discussion on how to handle sub-automata in the selective memoization schema. 
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Abstract. We present a compositional denotational semantics for a 
functional language with first-class parallel composition and shared-mem- 
ory operations whose operational semantics follows the Release/ Acquire 
weak memory model (RA). The semantics is formulated in Moggi’s mon- 
adic approach, and is based on Brookes-style traces. To do so we adapt 
Brookes’s traces to Kang et al.’s view-based machine for RA, and supple- 
ment Brookes’s mumble and stutter closure operations with additional 
operations, specific to RA. The latter provides a more nuanced under- 
standing of traces that uncouples them from operational interrupted exe- 
cutions. We show that our denotational semantics is adequate and use it 
to validate various program transformations of interest. This is the first 
work to put weak memory models on the same footing as many other 
programming effects in Moggi’s standard monadic approach. 


Keywords: Weak memory models - Release/Acquire - Shared state - Shared 
memory : Concurrency : Denotational semantics - Monads : Program refine- 
ment : Program equivalence - Compiler optimizations 


1 Introduction 


Denotational semantics defines the meaning of programs compositionally, where 
the meaning of a program term is a function of the meanings assigned to its 
immediate syntactic constituents. This key feature makes denotational semantics 
instrumental in understanding the meaning a piece of code independently of the 
context under which the code will run. This style of semantics contrasts with 
standard operational semantics, which only executes closed/whole programs. A 
basic requirement of such a denotation function [—] is for it to be adequate w.r.t. 
a given operational semantics: plugging program terms M and N with equal 
denotations—i.e. [M] = |N]—into some program context =[—] that closes over 
their variables, results in observationally indistinguishable closed programs in 
the given operational semantics. Moreover, assuming that denotations have a 
defined order (<), a “directed” version of adequacy ensures that [M] < [N] 
implies that all behaviors exhibited by = [M] under the operational semantics 
are also exhibited by = [N]. 

For shared-memory concurrent programming, Brookes’s seminal work ha) 
defined a denotational semantics, where the denotation |M] is a set of totally 
© The Author(s) 2024 
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ordered traces of M closed under certain operations, called stutter and mumble. 
Traces consist of sequences of memory snapshots that M guarantees to provide 
while relying on its environment to make other memory snapshots. Brookes [12] 
used the insights behind this semantics to develop a semantic model for sepa- 
ration logic, and Turon and Wand ed them to design a separation logic 
for refinement. Additionally, Xu et al. lis used traces as a foundation for the 
Rely/Guarantee approach for verification of concurrent programs, and Liang 
et al., Liang et al. B4, used a trace-based program logic for refinement. 


A memory model decides what outcomes are possible from the execution of 
a program. established the adequacy of the trace-based denotational 
semantics w.r.t. the operational semantics of the strongest model, known as 
sequential consistency (SC), where every memory access happens instantaneously 
and immediately affects all concurrent threads. However, SC is too strong to 
model real-world shared memory, whether it be of modern hardware, such as 
x86-TSO o, kal and ARM, or of programming languages such as C/C++ and 
Java 37|. These runtimes follow weak memory models that allow performant 
implementations, but admit more behaviors than SC. 


Do weak memory models admit adequate Brookeslstyle denotational se- 
mantics? This question has been answered affirmatively once, by Jagadeesan 
et al. , who closely followed [Brookes to define denotational semantics for 
x86-TSO. Other weak memory models, in particular, models of programming 
languages, and non-multi-copy-atomic models, where writes can be_observed by 
different threads in different orders, have so far been out of reach of Boe. to- 
tally ordered traces, and were iz Ba bal | by much more sophisticated models 


based on partial orders hs, 119, bg, laa}. 


In this paper we target the Release/Acquire memory model (RA, for short). 
This model, obtained by restricting the C/C++11 memory model to Release/ 
Acquire atomics, is a well-studied fundamental memory model weaker than x86- 
TSO, which, roughly speaking, ensures “causal consistency” together with “per- 
location-SC” and “RMW (read-modify-write) atomicity” ba . These assur- 
ances make RA sufficiently strong for implementing common synchronization 
idioms. RA allows more performant implementations than SC, since, in par- 
ticular, it allows the reordering of a write followed by a read from a different 
location, which is commonly performed by hardware, and it is non-multi-copy- 
atomic, thus allowing less centralized architectures like POWER ‘ 


Our first contribution is a Brookes-style denotational semantics for RA. As 
Brookes’s traces are totally ordered, this result may seem counterintuitive. The 
standard semantics for RA is a declarative (a.k.a. axiomatic) memory model, in 
the form of acyclicity consistency constraints over partially ordered candidate 
execution graphs. Since these graphs are not totally ordered, one might expect 
that Brooked' traces are insufficient. Nevertheless, our first key observation is 
that an operational presentation_of RA as an interleaving semantics of a weak 
memory system lends itself to eee ner semantics. For that matter, we de- 
velop a notion of traces compatible with Kang et al.’s “view-based” machine ba 
an operational semantics that is equivalent to RA’s declarative formulation. Our 
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main technical result is the (directed) adequacy of the proposed Brookes-style 
semantics w.r.t. that operational semantics of RA. 

A main challenge when developing a denotational semantics lies in mak- 
ing it sufficiently abstract. While full abstraction is often out of reach, as a 
yardstick, we want our semantics to be able to justify various compiler trans- 
formations/optimizations that are known to be sound under RA . Indeed, 
an immediate practical application of a denotational semantics is the ability to 
provide local formal justifications of program transformations, such as those per- 
formed by optimizing compilers. In this setting, to show that an optimization 
N -» M is valid amounts to showing that replacing N by M anywhere in a larger 
program does not introduce new behaviors, which follows from [M] < [N] given 
a directionally adequate denotation function [—]. 

To support various compiler transformations, we close our denotations un- 
der certain operations, including analogs to Brookes's stutter and mumble, but 
also several RA-specific operations, that allow us to relate programs which 
would naively correspond to rather different sets of traces. Given these closure 
operations, our semantics validates standard program transformations, includ- 
ing structural transformations, algebraic laws of parallel programming, and all 
known thread-local RA-valid compiler optimizations. Thus, the denotational se- 
mantics is instrumental in formally establishing validity of transformations under 
RA, which is a non-trivial task lig bq). 

Our second contribution is to connect the core semantics of parallel pro- 
gramming languages exhibiting weak behaviors to the more standard semantic 
account for programming languages with effects. presented his semantics 
for a simple imperative WHILE language, but Benton et al., Dvir et al. l6, 
later recast it atop Mogei’s monad-based approach be which uses a functional, 
higher-order core language. In this approach the core language is modularly ex- 
tended with effect constructs to denote program effects. In particular, we define 
parallel composition as a first-class operator. This is in contrast to most of the 
research of weak memory models that employ imperative languages and assume 
a single top-level parallel composition. 

A denotational semantics given in this monadic style comes ready-made with 
a rich semantic toolkit for program denotation fi transformations 5 SiT 
reasoning iB , etc.. We challenge and reuse this diverse toolkit throughout 
the development. We follow a standard approach and develop specialized logical 
relations to establish the compositionality property of our proposed semantics; 
its soundness, which allows one to use the denotational semantics to show that 
certain outcomes are impossible under RA; and adequacy. This development 
puts weak memory models, which often require bespoke and highly specialized 
presentations, on a similar footing to many other programming effects. 


, , > 


Outline. In §2 we lay the groundwork for the rest of the paper by introducing 
the programming language that we will use (82.1) the main ideas that underpin 
trace-based denotational semantics (2.2 , and the operational RA 
). In §3) we present the core aspects of our denotational semantics. 
First, we discuss our extension of RA’s operations semantics with first-class 
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parallelism, which enables denotations to be defined for concurrent composition 
(53.1). We then present RA traces (83.2) and use them to define the denotations 
of key program constructs (§3.3). Next, we show how the restriction of traces 
within denotations ($3.4) and the addition of closure operations 3.5) make our 
denotational semantics more abstract. The denotational semantics extends to the 
entire programming language standardly using Moggils monad-based approach 
3.6). With the denotational semantics in place, we_present our main results in 
G4. Finally, we conclude and discuss related work in §5, More details are available 
in the extended version of this paper 1l. 


2 Preliminaries 


We first introduce the language and its operational semantics under the Sequen- 
tial Consistency (SC) memory model 2.1. We then outline |B ’s denota- 
tional semantics for SC ( . Finally, we introduce Kang et als operational 
presentation of Release/ Acquire (RA) (§ 


2.1 Language and Operational Semantics 


The programming language we use is an extension of a functional language with 
shared-state constructs. Program terms M and N can be composed sequentially 
explicitly as M;N or implicitly by left-to-right evaluation in the pairing construct 
(M,N). They can be composed in parallel as M || N. We assume preemptive 
scheduling, thus imposing no restrictions on the interleaving execution steps 
between parallel threads. To introduce the memory-access constructs, we present 
the well-known message passing litmus test, adapted to the functional setting: 


(x:=13y:=1) || (y?, x?) (MP) 


Here, x and y refer to distinct shared memory locations. Assignment £:=v stores 
the value v at location £ in memory, and dereference /? loads a value from Z. 
The language also includes atomic read-modify-write (RMW) constructs. For ex- 
ample, assuming integer storable values, FAA (£, v) (Fetch-And-Add) atomically 
adds v to the value stored in £. In contrast, interleaving is permitted between 
the dereferencing, adding, and storing in Z := (€? + v). The underlying memory 
model dictates the behavior of the memory-access constructs more specifically. 

In the functional setting, execution results in a returned value: (:=v returns 
the unit value (), ie. the empty tuple; 2?, and the RMW constructs such as 
FAA (£, v), return the loaded value; M ; N returns what N returns; and (M,N), 
as well as M || N, return the pair consisting of the return value of M and the 
return value of N. We assume left-to-right execution of pairs, so in the 
example (y?,x?) steps to (v,x?) for a value v that can be loaded from y, and 
(v,x?) steps to (v, w) for a value w that can be loaded from x. In between, the 
left side of the parallel composition (||) can take steps. 

We can use intermediate results in subsequent computations via let binding: 
leta = Min N binds the result of M to a in N. Thus, we execute M first, 


A Denotational Approach to Release/Acquire Concurrency 125 


and substitute the resulting value V for a in N before executing Nia > V]. 
Similarly, we deconstruct pairs by matching: match M with (a,b). N binds the 
components of the pair that M returns to a and b respectively in N. The first 
and second projections fst and snd, as well as the operation swap that swaps the 
pair constituents, are defined using match standardly. 


Sequential consistency. In the strongest memory model of Sequential Consis- 
tency (SC), every value stored is immediately made available to every thread, 
and every dereference must load the latest stored value. Thus the underlying 
memory model uses maps from locations to values for the memory state that 
evolves during program execution. Given an initial state, the behavior of a pro- 
gram in SC depends only on the choice of interleaving of steps. Though any 
such map can serve as an initial state, litmus tests are traditionally designed 
with the memory that sets all values to 0 in mind. In hie) the order of the two 
stores and the two loads ensures that executions under SC may return (() , (0, 0)), 


(() ,(0,1)), and (0 , (1, 1)), but not (() , (1,0)). 


Observations. An observable behavior of an entire program is a value it may 
evaluate to from given initial memory values. While programs may internally 
interact and observe the memory, we do not consider it feasible to observe the 
memory directly. 


2.2 Overview of Brookes’s Trace-based Semantics 


Observable behavior as defined for whole programs is too crude to study 
program terms that can interact with the program context within which they 
run. Indeed, compare Mı defined as x := 1 ; y := 1 ; y? versus Mə defined as 
x:=1;y:= x? ;y?. Under SC, the difference between them as whole programs 
is unobservable: starting from any initial state both return 1. Now consider 
them within the program context — || x:= 2. That is, compare Mı || x := 2 
versus Mə || x := 2. In the first, Mı still always returns 1; but in the second, 
Mz can also return 2 by interleaving the store of 2 in x immediately after the 
store of 1 in x. Thus, if |M], i.e. M’s denotation, were to simply map initial 
states to possible results according to executions of M, we could not define 
[M || N] in terms of [M] and [N] alone, because we would have [MM] = [M2] 
but also [Mj || x := 2] A [Mə || x:= 2]. We conclude that |M] must contain 
more information on M than an “input-output” relation; it must account for 
interference by the environment. 


Adequacy in SC. A prominent approach to define compositional semantics for 
concurrent programs is due to Brookes 13) , who defined a denotational semantics 
for SC by taking |M] to be a set of traces of M closed under certain rewrite 
rules as we detail below. established a (directional) adequacy theorem: 
if [M] 2 [N] then the transformation M —> N is valid under SC. The latter 
means that, when assuming SC-based operational semantics, M can be replaced 
by N within a program without introducing new observable behaviors for it. 
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Thus, adequacy formally grounds the intuition that the denotational semantics 
soundly captures behavior of program terms. 

As a particular practical benefit, formal and informal simulation arguments 
which are used to justify transformations in operational semantics can be re- 
placed by cleaner and simpler proofs based on the denotational semantics. For 
example, a simple argument shows that [x:=v;x:=w] 2 [x:= w] holds in 
mE semantics. Thanks to adequacy, this justifies Write-Write Elimination 
(WW-Elim) x := v ; x := w —> x := w in SC. 


Traces in SC. In Brookes's semantics, a program term is denoted by the set of 
traces, each trace consisting of a sequence of transitions. Each transition is of 
the form (p, p), where u and p are memories, i.e. maps from locations to values. 
A transition describes a program term’s execution relying on a memory state u 
in order to guarantee the memory state p. 

For example, [x := w] includes all traces of the form |(p, p [x := w])|, where 
p [x := w] is equal to p except for mapping x to w. The definition is composi- 
tional: the traces in [x := v ; x := w] are obtained from sequential compositions 
of traces from [x:=v] with traces from [x:= w], obtaining all traces of the 
form [(u, u [x := v]) (p, p [x := w])}. Such a trace relies on p in order to guaran- 
tee [x := v], and then relies on p in order to guarantee p[x := w]. Allowing 
p # w(x := v] reflects the possibility of environment interference between the 
two store instructions. Indeed, when denoting parallel composition [M || N] we 
include all traces obtained by interleaving transitions from a trace from [M] 
with transitions from a trace from |N]. By sequencing and interleaving, one 
subterm’s guarantee can fulfill the requirement which another subterm relies on. 
They may also relegate reliances and guarantees to their mutual context. 

In the functional setting, executions not only modify the state but also return 
values. In this setting, traces are pairs, which we write as |€|...r, where € is the 
sequence of transitions and r represents the final value that the program term 
guarantees to return iG). For example, the semantics of dereference [x?] includes 
all traces of the form <. u(x). Indeed, the execution of x? does not change 
the memory and returns the value loaded from x. In the semantics of assignment 


[x := v], instead of |(u, u [x := v])| we have |(y, [x := v])] 0. (). 


Rewrite rules in SC. Were denotations in Brookegs semantics defined to only 
include the traces explicitly mentioned above, it would not be abstract enough 
to justify (W W-Elin), which eliminates redundant writes. Indeed, we only saw 
traces with two transitions in [x := v ;x := w], but in [x := w] we saw traces 
with one. The semantics would still be adequate, but it would lack abstraction. 
This is where Bacal: second main idea comes into play, making the denota- 
tions more abstract by closing them under two operations that rewrite traces: 


Stutter adds a transition of the form (u, u) anywhere in the trace. Intuitively, 
a program term can always guarantee what it relies on. 

Mumble combines a couple of subsequent transitions of the form (u, p) (p, 6) 
into a single transition (~, 0) anywhere in the trace. Intuitively, a program 
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term can always omit a guarantee to the environment, and rely on its own 
omitted guarantee instead of relying on the environment. 


Denotations in Brookeg's semantics are defined to be sets of traces closed 
under rewrite rules: applying a rewrite to a trace in the set results in a trace 
that is also in the set. For example, [x := w] is the least closed set with all traces 
of the form |(p, p [x := w])| =. 0, and [x := v ; x := w] is the least closed set with 
all sequential compositions of traces from [x := v] with traces from [x := w]. 


Closure under these rules makes traces in [M] correspond precisely to inter- 
rupted executions of M, which are executions of M in which the memory can 
arbitrarily change between steps of execution. Each transition (u, p) in a trace in 
[M] corresponds to multiple execution steps of M that transition u into p, and 
each gap between transitions accounts for possible environment interruption. 
The rewrite rules maintain this correspondence: stutter corresponds to taking 0 
steps, and mumble corresponds to taking n + m steps instead of taking n steps 
and then m steps when the environment did not change the memory in between. 
TE adequacy proof is based on this precise correspondence. In particular, 
the single-pair traces in |M] correspond to the (uninterrupted) executions, the 
“input-output” relation, of M. 


Abstraction in SC. [Brookes’s semantics is fully abstract, meaning that the con- 
verse to adequacy also holds: if N —> M is valid under SC, then [N] > [M]. 
However, Te. proof relies on an artificial program construct, await, that 
permits waiting for a specified memory snapshot and then step (atomically) 
to a second specified memory snapshot. Thus, in realistic languages, when this 
construct is unavailable, Brookes full abstraction proof does not apply. 
Nevertheless, even without full abstraction, one can still provide evidence 
that an adequate semantics is abstract by ensuring that it supports known trans- 
formations. As an example, we show directly that [x:=v3;x:=w] 2 [x:=u] 
holds in Brookes’s semantics. Since [x := v ; x := w] is closed, it suffices to show 
that [x := v ; x := w] 2 {[(u, u [k := w))] ~. () | memory u}. For a memory u, we 
have |(u, u [x := v]) (p, p [x := w)))..() € [x := v ; x := w] for every memory p, in 
particular when p = p |x := v]. Since p [x := w] = u [x := v] [x := w] = u [x := w], 
we have Ku, u [x := v]) (u [x := v], u [x := w))] z0 € [x := v; x := w]. After ap- 


plying mumble, we have |(u, u [x := w])) ~. 0 € [x:= v; x := w]. 


2.3 Overview of Release/ Acquire Operational Semantics 


Memory accesses in RA are more subtle in than in SC. To address this we 
adopt Kang et al.’s “view-based” machine ba, an operational presentation of 
RA proven to be equivalent to the original declarative formulation of RA [e.g. 

. In this model, rather than the memory holding only the latest value written 
to every variable, the memory accumulates a set of memory update messages for 
each location. Each thread maintains its own view that captures which messages 
the thread can observe, and is used to constrain the messages that the thread 
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Fig. 1. Illustrations of a memory (top) and a trace (bottom), in the setting of two 
memory locations, x and y. Top: A memory holding six messages. The timelines 
are purposefully misaligned and not to scale to emphasize that timestamps for 
different locations are incomparable and that only the order between them is 
relevant. The graph structure that the views impose is illustrated by arrows 
pointing between messages. Messages that are not dovetailed are set apart, e.g. 
v3 dovetails with v2, which does not dovetail with vı. Bottom: A trace with 
two transitions: a|(u1, p1) (u2, p2)|w .. 5. The memory illustrated on top is p2. 
Messages and edges that are not part of a previous memory are highlighted. The 
local messages are v2 and v3, and the rest are environment messages. 


may read and write. The messages in the memory carry views as well, which are 
inherited from the thread that wrote the message, and passed to any thread that 
reads the message. Thus views indirectly maintain a causal relationship between 
messages in memory throughout the evolution of the system. 

More concretely, causality is enforced by timestamping messages, thus plac- 
ing them on their location’s timeline. To capture the atomicity of RMWs, each 
message occupies a half-open segment (q,t] on their location’s timeline, where 
t is the message’s timestamp. It dovetails with a message at the same location 
with timestamp g. An RMW “modifies” a message by dovetailing with it. 

A view « associates a timestamp «(£) to each location £, obscuring the portion 
of ls timeline before «(@). The view points to a message at ¢ with timestamp 
K(f). A view w dominates a view a, written a < w, if a(l) < w(£) for every £. 

Messages point to messages via the view they carry, and must point to them- 
selves. So when specifying a message, the value its view takes at its location 
may be omitted. For example, assuming of two location, x and y, we denote by 
x:10(.5, 1.7] (y@3.5)) the message at location x that carries the value 1, occupies 
the segment (.5, 1.7] on x’s timeline, and carries the view « such that s(x) = 1.7 
and «(y) = 3.5. An example memory is depicted on the top of cd 

When a thread writes to £, it must increase the timestamp its view associates 
with @ and use its new view as the message’s view. The message’s segment 
must not overlap with any other segment on @’s timeline. In particular, only one 
message can ever dovetail with a given message. A thread can only read from 
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Fig. 2. Depictions of a step during an execution of a litmus test, with the view 
of the right thread changing from o to a’. The value each message carries is in 
its bottom-right corner. Views are illustrated implicitly in the graph structure 
that they impose. Obscured messages are faded. Left: As the right thread in 
loads 1 from y, it inherits the view of €1, obscuring vo. Right: The right 
thread in loading 0 from x. Storing €, did not obscure vo. 


revealed messages, and when it reads, its view increases as needed to dominate 
the view of the loaded message. This may obscure messages at other locations. 

Revisiting the ( litmus test, starting with a memory with a single message 
holding 0 at each location, and with all views pointing to the timestamps of these 
message, suppose the right thread loaded 1 from y, as depicted on the left side of 
Such a message can only be available if the left thread stored it. Before 
storing 1 to y, the left thread stored 1 to x, obscuring the initial x message. The 
right thread inherits this limitation through the causal relationship, so it will 
not be able to load 0 from x. Therefore, RA forbids the outcome (() , (1, 0)). 

In contrast, consider the litmus test known as store buffering: 


(x:= 15 y?) || (y= 15x?) (SB) 


By considering the possible interleavings, one can check that no execution in SC 
returns (0,0). However, in RA some do. Indeed, even if the left thread stores to 
x before the right thread loads from_x, the right thread’s view allows it to load 
0, as depicted on the right side of Figure 4 

We can recover the SC behavior by interspersing fences between sequenced 
memory accesses, which we model with FAA (z,0) to a fresh location z. Thus, 
compare 6B) to the store buffering with fences litmus test: 


(x:= 1; FAA (z,0) ; y?) || (y := 1 ; FAA (z, 0) ; x?) (SB+F) 


Both of the FAA (z, 0) instructions store messages that must dovetail with the 
message that they load from, and in that also inherit its view. They cannot both 
dovetail with the same message because their segments cannot intersect. Thus, 
one of them—-say, the one on the right—will have to dovetail with the other. In 
this scenario, the view of the message that the left thread stores at z points to 
the message it previously stored at x. When the right thread loads the message 
from z it inherits this view, obscuring the initial message to x. Therefore, when 
it later loads from x, it must load what the left thread stored. Thus, like in SC, 
no execution in RA returns (0,0). 
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3 Denotational Semantics for Release/ Acquire 


We start this section by explaining how we support first-class concurrent compo- 
sition (||) in the operational semantics of Release/ Acquire aah In the rest of 
the section we present the core of our denotational semantics. First, we present 
our notion of a trace, adapted to RA, along with four basic rewrite rules that our 
denotations are closed under . Next, we define the denotations of the key 
program constructs ( . We then present further aspects of the denotational 
semantics that_make it more abstract: restrictions that traces in denotations 
must uphold E) and three more rewrite rules under which denotations are 
closed EN For completeness, we show how to give denotations to the whole 
language standardly, using Moggi’ approach (§3.6). 


3.1 First-class Concurrent Composition 


Kang et al. presentation assumes top-level parallelism, a common practice in 


studies of weak-memory models. This comes at the cost of the uniformity and 
compositionality. In particular, the denotation |M || N] cannot be defined. We 
resolve this by extending Kang et als operational semantics to support first-class 
parallelism by organizing thread views in an evolving view-tree, a binary tree 
with view-labelled leaves, rather than in a fixed flat mapping. Thus, states that 
accompany executing terms consist of a memory and a view-tree. In discourse, 
we do not distinguish between a view-leaf and its label. 

An initial state consists of a memory with a single message at each location, 
and a view which points to these messages’ timestamps. The example below 
shows how threads inherit their parent’s view upon activation and combine their 
views as they synchronize: 


Example. In the following, ~~» is the execution step relation, ~»* is its reflexive- 
transitive closure, Mo is an initial memory, & is the «k-labelled view-leaf, T ~ Ris 
the view-tree that consists of a node connected to the view-trees T and R, and 
w is the least view that dominates both w, and wa: 


(10,4) , M 5 (N1 || N2) =>" (ten,€"), Ni || No ~> (p, & 6"), Ni | No 


mt (pàn ta) Vi I Va ~ (9,2) , (Vi, Va) 


First, M runs until it returns a value, which is discarded by the sequencing 
construct. Next, the parallel composition Nj || N2 activates. The threads then 
interleave executions, each with its associated side of the view-tree. Finally, once 
both threads return a value, they synchronize. 


Handling parallel composition as a first-class construct allows us to decom- 
pose Write-Read Reordering (WR-Reord) (x:=v) 3; y? — fst (y?,(x:=v)), a 
crucial reordering of memory accesses valid under RA but not under SC, into a 
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combination of Write-Read Deorder (WR-Deord) ((x := v) ,y?} > (x := v) || y? 
together with structural transformations and laws of parallel programming: 


{Structural | (WR-Deord 
(x:= v); y? —> snd ((x:=v),y?) —> snd ((x:= v) || y?) 
Par. Prog. Law: Symmetry {Structural {Par Prog. Law: Sequencing 
— snd (swap (y? || (x:=v))) — fst (y? || (x := v)) — fst (y?, (x := v)) 


This provides a separation of concerns: the components of this decomposition are 


supported by our semantics using independent arguments. It also sheds a light 
on the interesting part, as they are all valid under SC except for (WR-Deord). 


3.2 Traces for Release/Acquire 


Adapting Brookes's SC-traces, our RA-traces also include a sequence of transi- 
tions €, each transition a pair of RA memories; and a return value r. Intuitively, 
these play a similar role here, formally grounded in analogs to the stutter and 
mumble rewrite rules. Seeing that the operational semantics only adds messages 
and never modifies them, we require that every memory snapshot in the sequence 
€ be contained in the subsequent one, whether it be within or across transitions. 
A message added within a transition is a local message; otherwise it is an en- 
vironment message. We call the first memory in €’s first transition its opening 
memory, and the second memory in €’s last transition its closing memory. 

In addition, RA-traces include an initial view a, declaring which messages are 
relied upon to be revealed in €’s opening memory; and a final view w, declaring 
which messages are guaranteed to be revealed in €’s closing memory. We ground 
these intuition formally in the rewind and forward rewrite rules below. 

We write the trace as a (Jw... See an illustration on the bottom of Figure 1| 


Stutter & Mumble. We define the stutter (St) and mumble (Mu) rewrite rules: 


al—nw or 5 aff, pnw or a (u, o, Onw s.r Ms aE, Anw or 


As in Brookes's semantics, their role is to make the semantics more abstract by 
divorcing the length of the sequence from the individual steps taken in the oper- 
ational semantics, while maintaining the transitions’ Rely /Guarantee character. 


Rewind & Forward. The rewind (Rw) rewrite rules establish the fact that the 
term only relies on certain messages being revealed, not on messages being ob- 
scured. The rewind rule modifies the initial view, making it point to earlier 
messages on the timelines. Thus, relied upon messages will remain available af- 
ter the rewrite. Similarly, the forward (Fw) rewrite rule establish the fact that 
the term only guarantees that certain messages are revealed. The forward rule 
modifies the final view, making it point to later messages on the timelines. Thus, 
any message guaranteed to be available was already guaranteed beforehand. The 
rules are schematically depicted in TE 
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a | | = | | a o a || || 
Fig. 3. Schematic depictions of the rewind and forward rewrite rule, focusing on 
a single location, where the initial/final view points to v before and points to € 


after. The messages v and e may coincide, dovetail, or be separated. Left: The 
initial view a is “rewound” to a’. Right: The final view w is “forwarded” to w’. 


3.3 Introducing Denotations for RA 


We present denotations of key constructs of the programming language. By 
referring to the notion of a closed set below, we mean a set that is closed_under 
certain rewrite rules, such as stutter, mumble, rewind, and forward from . 


Pure. A pure (i.e. effect-free) computation guarantees a returned value, and 
otherwise can only guarantee what it relies on. For example, define [2 + 3] as 
least closed set with all traces of the form «|(p, 1)|K 2.5. 


Sequence. In denoting sequential composition we must make sure that the first 
component does not obscure any message that the second component relies on. 
Thus, define [(M, N)] as least closed set with all traces of the form a{€m]w.’.(r, s}, 
where there exists a view « such that a[€)«..r € [M] and Kw ~.s € [N]. 
The existence of the revealed messages is implicit: €’s closing memory must 
be contained in the memory that follows it, which is ņ’s opening memory. The 
definition of |M ; N] is the same, except that the first component of the returned 
pair is discarded. That is, with traces of the form a[€m]w.-. s. 


Parallel. Threads composed in parallel rely on the same preceding sequential en- 
vironment and guarantee to the same succeeding sequential environment. Thus, 
define |M; || M2] as the least closed set with all traces of the form a{€]w.".(r1, r2), 
where there exist sequences €; and £9 such that and € is obtained by interleaving 
their transitions, and a[€;]w..r; € [Mj] (for i € {1, 2}). 


Dereference. We define [¢?] to be the least closed set with all traces of the form 
al (i, u)}w..v, where £:v@(q, a(£)](«)) € u for some timestamp q and view «, and 
both a < w and «k < w. 


Assignment. Define [£ := v] as the least closed set with all traces of the form 
al (i, p)|w.. () where p is obtained by adding the message ¢:v@(q, w(£)|(w)) to u 
for some timestamp q, and a < w. 


Read-modify-write. The definition of [FAA (£, w)] combines the two above, along 
with a dovetailing requirement. Specifically, it is the least closed set with all 
traces of the form a[(j, p)]w.".v, where ¢:v0(q, a()](K)) € u for some timestamp 
q and view «K, both a < w and & < w, and p is obtained by adding the message 
L: (v+w) @(a(£), w(2)] (w)) to u. The semantics of other RMWs is defined similarly. 
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Example. We show that [¢:=v;v] C [¢:=v; £?]. When sequencing two traces, 
the final view of the first must match the initial view of the second, so traces in 
[l := v ; v] have the form a|(, p) (0,0)|w..v, where p is obtained by adding the 
message ¢:v@(q, w(£)| {w} to u for some timestamp q, and a < w. Since w points 
to this added message, and since p C 0 as memories along a trace’s sequence, 
w((0,0)|w.-.uv € [€?]. By sequencing, a[(u, p) (9,9)|w..v € W:= v ; £2]. 


3.4 Correspondence to the Operational Semantics 


Traces in denotations, if unconstrained, may represent behaviors that include 
operationally unreachable states. Forbidding such redundant traces eliminates a 
source of differentiation between denotations, thus increasing their abstraction. 


Reachable states. Consider the transformation x? ; y? —> y?, a consequence of 
the RA-valid Irrelevant Read Elimination (R-Elim) x? ; () —> () and structural 
equivalences. Consider the state S that consists of the memory at the top of 
and the view that points to v3 and eg. The only step x? ; y? can take 
from the state S is to load v3, inheriting the view that v3 carries, which changes 
the thread’s view to point to e3. Only e3 is available in the following step, which 
means the term returns 3. In contrast, starting from S, the term y? can load 
from e2 to return 7. This analysis does not invalidate the transformation because 
the state S' is unreachable by an execution starting from an initial state, and 
should therefore be ignored when determining observable behaviors. 


Internalizing invariants. Just as we ignore unreachable states in the operational 
semantics, we discard “unreachable” traces to refine our denotational semantics. 
We consider a state to be valid if it adheres to the following invariants. 


Scattering: segments in memory never overlap. 

Pointing: views always point to messages. 

Dominating: views always dominate the views of the messages to which they 
point. This invalidates the state S above, because the view of the thread 
does not dominate the view of v3 even though it points to it. 

Descending: a path from a message along the view-induced graph structure can- 
not end in another message with a greater timestamp at the same location. 
Demonstrated both positively and negatively in Figure 4 

Acyclicity: a cycle along the view-induced graph structure consists solely of mes- 
sages which have the smallest timestamp on their timeline. 


Memory snapshots in traces are required to obey each of the invariants above. 
The initial and final view must point to and dominate the opening and closing 
memory respectively. This means that there must be a message to load that 
allows the initial and final view to be equal, and we obtain [x?; O] 2 [0]. 

We also uphold requirements that correspond to the relation between the 
states across a possibly-interrupted series of steps in the operational semantics: 


Accumulating: the memory after contains the memory before. We require that 
every memory snapshot contains the one before it. 
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Fig. 4. Two variations on the memory illustrated in Top: This can 
function as a memory snapshot in a trace. It demonstrates that the views of 
messages along a timeline do not have to be ordered: €2 appears earlier than €3 on 
y’s timeline but points to a later message on x’s timeline. Bottom: This cannot 
function as a memory snapshot in a trace, because it contains an ascending path. 
Intuitively, no thread could have written e2 because the view that e2 carries 
indicates that the thread would have already “known” about v3 and therefore, 
following the causality chain, about €3 as well. Thus, the thread would have been 
forbidden from picking €9’s timestamp. 


Delimiting: if the view-trees before and after are leaves, then the view after 
dominates the view before, and the view of any written message dominates 
the view before and is dominated by the view after. We impose the analogous 
requirement on the initial and final views, and on the local messages. 


The trace in adheres to the invariants and relationships we have listed. 


Concrete_operational correspondence. We call the rewrite rules that were de- 
fined in E concrete because they maintain a certain concrete interpretation of 
traces. To see this, consider the operational semantics for RA augmented with 
an additional kind of step, which any term can take. The only change along this 
step is that a view in the view-tree inherits the view from a message that is 
available to it. This addition does not change the observable behaviors of whole 
programs, and maintains the above invariants. 

Each trace in the denotations of 3.3, if closed only under the concrete rewrite 
rules, corresponds to an interrupted execution in the augmented operational 
semantics. The correspondence is similar to that from Bisoed: semantics in 
terms of the sequence of transitions and return value. The initial and final views 
determine the views at the beginning and the_end of the interrupted execution. 

The introduction of the rewrite rules in §3.5| will mean that traces do not 
have such a clear operational interpretation. The key to our proof of adequacy 
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Fig.5. Schematic depiction of the tighten rewrite rule, that focuses on a par- 
ticular memory snapshot within the trace, in the setting of k+1 locations. The 
message v is “tightened” to v’, such that for each i it points to 6; instead of «€;. 
This includes the case that 6; and €; are the same message in some locations. 


is to partially recover this operational correspondence in terms of the overall 
observable behaviors ( 


3.5 Abstract Rewrite Rules 


Transitions in RA traces consist of sets of messages, which record much more 
information about the operational execution than the mappings from locations 
to values we had in SC. This makes the trace-based semantics too concrete. We 
resolve the memory-concreteness issue by introducing three abstract rewrite rules 
that obfuscate information about local messages. This makes the denotations 
more abstract by blurring the distinctions that denotations can make. 


Tighten. Recall the transformation (WR-Deord) that we wish to support. Let 


Tı € [x := v] and 7 € [y?], such that they compose sequentially to form a trace 
from [((x:=v),y?)]. Then 7,’s final view « must equal 7’s initial view. The 
view « dominates the view o of the local message vı stored by 7,, and & cannot 
obscure the message vs from which 72 loaded its value. Thus, ø cannot obscure 
V2. In contrast, consider 74 and 72 that compose in parallel to form a trace from 
[(x := v) || y?]. Here, the view of the local message may very well obscure the 
loaded message. Indeed, the final view of 7, may dominate the initial view of 7». 

To resolve this, observe that the purpose of recording views in messages is to 
encumber its loaders. Under this perspective, the view of a local message guaran- 
tees to the environment that loading the local message will keep certain messages 
revealed. Therefore, making the view larger only weakens the guarantee. Thus, 
we introduce the tighten (Ti) rewrite rule that makes the view of a local mes- 
sage larger. The rule is depicted in Figure 3 and provides a concrete 


example. Using tighten, we can show that [((x:=v),y?)] 2 [(x:=v) || y?]. 


Absorb. Recall the transformation (\WW-Elim) that we wish to support. To show 


this we aim to replicate, as far_as we can, the reasoning we have used to show 
[x := v ; x := w] 2 [x := w] in Brookes’s semantics. Recall that, to use mumble, 
we made the memories match across the two transitions of [x := v ; x := w]. 
Doing so here, we end up with two local messages, whereas traces from [x := w] 
only have a single local message. Roughly speaking, the equality concerning SC 
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Fig. 6. A possible result from rewriting the trace from using tighten. 
Since v2 is local in the trace from aeS] tighten can advance its view to point 
to €3 instead of €,. The same replacement is applied throughout the trace’s 
sequence, not just the closing memory. 


memories 4 [x := v] [x := w] = u [x := w] does not transfer to RA where memory, 
by accumulating messages, is more concrete. We resolve this by adding the absorb 
(Ab) rewrite rule, which replaces two dovetailed local messages with one that 
carries the second message’s value. The rule is depicted in Figure J and 
provides a specific example. 


Dilute. There is another known family of transformations that are valid under 
RA memory, yet we cannot justify with the rules we presented. These introduce 
non-modifying atomic updates, such as Read to FAA (R-FAA) 4? —> FAA (4,0). 

Running within some context, FAA (£, 0) reads a message v, to which it dove- 
tails another message € with the same value. It’s possible that some ( dovetails 
with e later in the execution. In the same context, we can simulate this behavior 
with £? instead, by having the context provide v’ instead of v, with the differ- 
ence that it takes up the same segment that v and e have taken up combined. If 
there is a 8 as mentioned, it can now dovetail with v’ to the same effect. In this 
scenario, v is an environment message, but we must also account for the case 
that it is local to allow for composition, such as in :=v;0? — €:=v;FAA (£,0). 

We internalize the idea behind this argument as the dilute (Di) rewrite rule, 
in which a message is replaced by two message that together occupy the same 
segment, the second being a local message that cannot appear before the first in 


the trace and must_carry the same value. With dilute, [@?] > [FAA (2, 0)]. The 
rule is depicted in Figure 7 and provides a specific example. 


3.6 Monadic Presentation 


One of the contributions of this work is to bridge research of weak-memory 
models with Moggi’s monad-based approach to denotational semantics. In 
this approach, one start by defining a monad, which has three components. The 
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Fig. 7. Schematic depictions of the absorb (left) and dilute (right) rewrite rules, 
that focus on the segment of the dovetailed messages together with all pointers 
into and out of them, within a particular memory snapshot. The circular cloud 
represents the subset of the memory that the messages in focus are pointing to, 
showing that they all have the same view. The elliptical clouds represent views— 
including the initial and final view, as well as other messages—that point to each 
of the dovetailing messages. Left: The message v is “absorbed” into the message 
c€ to become e’. No view may point to v. Right: The message v’ “dilutes” into 
v and e. While e must be a local message, v and v’ can appear anywhere the 
trace’s sequence, as long as they appear in the same places in the sequence, and 
that € does not appear before. The views that point to v’ before diluting can 
point either to v or to e€ after diluting. 


first associates for every set X, which we think of as representing returned values, 
to a set TX representing computations that return values from X. In our case, 
TX consists of countable sets of traces closed under rewrite rules. 

Denotations are then defined according to their typing judgments. For ex- 
ample, a,b : Loc F (a,b?) : (Loc x Val) means that in the context that the free 
variables a and b are locations, the term (a,b?) is a location-value pair. Given 
a function y that maps a and b to locations, [(a,b?)] y € T (Loc x Val). For 
It M:AandIt N: A, we generalize containment [N] 2 [M] pointwise: 
if y maps variables in I appropriately by their type, then [N] y 2 [M] y. This 
degenerates when I" is empty, i.e. when M and N are closed terms. 

The second monad component is a function return’ : X + TX maps values 
to pure computations that return that value. The third component sequences 
computations, such that the latter depends on the value returned by the for- 
mer: (Ex y) : (TX) x (X > TY) > TY. Omitting the indices, the monad 
components must satisfy certain axioms that formalize the stated intuition: 
returnr)=f = f(r), P)= return = P and (P)=f) )Eg = P)=Ar. (f(r)}=g9). 

In our case, we define returnr as the least closed set with all traces of the 
form K[(u, Wy). r; and P )= f as the least closed set with all traces of the form 


al€nlw ..s, where a[fjk..r € P and kw ~.s € f(r) for some x. 


Denotations. This approach comes read-made with denotations for standard lan- 
guage constructs. For example, [(M,N)] y := [M] y )= Ar. ((N] y )& As. (r, s)). 
Similarly, [match M with (a,b). N] y := [M]7 } A, s). [N] y[a r] [bb s], 
where y [a > r] is obtained from y by mapping a to r. Pure computations use 
the return function, e.g. [v] = return v. 

Program effects can be modularly introduced in this approach, such as mem- 
ory access, where [@:=v] € 7 {()} and [€?], [FAA (4,v)] € ZVal; and par- 
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tin the same transition, so by rewriting by absorb here can be ree in Vs 
obtained by stretching v3’s segment to cover v2’s segment. 


allel composition, a function (es TX x TY —> T(X xY) with which 
[M || N] y:= [M] 7 ||| [N] y. The definition remains the same: we obtain traces 
in P ||| Q by interleaving transitions and pairing returned values of traces with 
matching views, one from P and one from Q. 

Adhering to left-to-right evaluation both operationally and denotationally, 
M := N is equivalent to match (M, N) with (a, b).a:=b. In traces of assignment, 
the added local message is free to dovetail with a previous message, unlike in 
RMW traces where it must. Therefore, we have [¢:= (£? + v)] 2 [FAA (2, v)]. 


Structural reasoning. Among the general results and proof techniques this ap- 
proach supplies are structural equivalences. These are denotational equations 
that hold due to the properties of the core calculus, and are preserved by mod- 
ular expansions with program effects. For instance, if K is effect-free, then 
[if K then M ; N else M ; N’] = [M ; if K then N else N’ ]. Equivalences such as 
this one may otherwise require challenging ad-hoc proofs [e.g. 

More generally, structural reasoning composes to derive further equivalences. 
For example, from [()] = |£? ; ()] and structural equivalences, namely “left neu- 
trality” [AK] = [0 ; K] and “associativity” [(M;N);K] = [M; (N; K)]: 


[K] = [0 ; A] = [6 ; 0); £] = [Ps (0; ©] = E; K] (x) 


Structural reasoning generalizes to program transformations. For example, 
(}=) is monotonic, so we can also derive: 


[0] = E? 3 0] = D.I] 2 [FAA (4, 0)])}=Av.[0] = [FAA (4,0) 5 0] 
Since (|||) is also monotonic, we can use this to show that (SB) 2 [(6B+)]. 
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Fig.9. A possible result from rewriting of the trace from using dilute. 
The message €, from was replaced with €}, with the same value 1. The 
local message $—which takes up the rest of the missing space left behind by 
€;—always appears with e1, dovetailing with it and carrying the same value. 
The message €2, that used to dovetail with €1, now dovetails with 8. 


Ee) | 


Higher order. An important aspect of a programming language is its facilitation 
of abstraction. Higher-order programming is a flexible instance of this, in which 
programmable functions can take functions as input and return functions as 
output. Moggi’s approach supports this feature out-of-the-box, in such a way 
that does not complicate the rest of the semantics, as the first-order fragment 
of the semantics need not change to include it. 

Every value returned by an execution has a semantic presentation which 
we use as the returned value in traces. The semantic and syntactic values are 
identified in the first-order fragment, but different syntactic functions may have 
the same semantics, so the identification does not extend to higher-order. 

We classify a term as a program if it is closed (every variable occurrence is 
bound) and of ground type (all functions are applied to arguments). This defi- 
nition is in line with the expectation that a program should return a concrete 
result that the end-user can consume. Thus, we only consider observable behav- 
iors of programs. Transformations only need to be valid when applied within 
programs. Programs degenerate to closed terms in the first-order fragment. 


4 Main Results 


We present the main results that we have proven about our denotational seman- 
tics. Moggils semantic toolkit features ubiquitously in their proofs. 


Compositionality. In its most basic form, this key feature of denotational seman- 
tics means that a program term’s denotation is defined using the denotations of 
its immediate subterms. We have used this in (&). In our case denotations are 
sets, where each elements represents a possible behavior of the term, we are 
interested in establishing a directional generalization of compositionality: 
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Lemma 1. /f[M] C [N] then [E[M]] € [E[N]] for any program context E [-]. 


Compositionality is a consequence of its monadic design using monotonic oper- 
ators, and is not substantially different from previous work [e.g. 20]. 


Observability correspondence. The abstract rewrite rules break the direct cor- 
respondence_between traces and interrupted executions. For example, in our 
analysis of (WW-Elimi , by using absorb, we ended up with a trace in which only 
one message is added even though the program term adds two messages. 

Still, some connection must remain to obtain a proof of adequacy. In partic- 
ular, we would like traces to correspond to observable behavior of programs. In 
one direction, an even stronger property holds, known as soundness: 


Lemma 2. For every execution of a program M in the operational semantics of 
RA, there exists a|(p,p)\w..r € [M] that matches the execution: (a, u) is the 
initial state, (w, p) is the final state, and r matches the value returned. 


To prove soundness, we take a trace where transitions correspond to the memory- 
accessing execution steps, and then use mumble to obtain a single transition. 
Ignoring the final state, the correspondence holds in the other direction too: 


Lemma 3. For every program M and a|(s, p)|w..r € [M] there is an observable 
behavior of M with initial state (a, u) and return value matching r. 


The lack of correspondence with the final state is an artifact of the concreteness- 
abstraction divergence between the operational and denotational semantics. Due 
to this divergence, it is significantly more challenging to establish this direction 
of the correspondence than in previous work. 


Overcoming the concreteness-abstraction hurdle. The most technically challeng- 
ing step in proving is to prove the application of abstract rewrite rules 
can be deferred to the end. We define the basic denotation of a term M by [M], 
which is the denotation were it defined using only the concrete rewrite rules. 
Denoting its closure under the abstract rewrite rules by |M I, we claim: 


Lemma 4. If M is a program, then [M]' = [M]. 


Thus, to obtain all of the traces that result from the regular denotational con- 
struction, where all of the rewrite rules are applied throughout the entire de- 
notational construction, it is enough to close only under the concrete rewrite 
rules as the denotation of a program is built-up from its subterms, applying the 
abstract rewrite rules only at the top level. 

The intuition that guides the inductive proof of is that the abstract 
rewrite rules can be percolated out. To get the main idea across while keeping 
the discussion self-contained, we focus on the [M, || M2] 2 [M || M2] case. 

Let r € |M || Mo]. By definition, 7 is obtained by first composing some 
Tı € [Mi] in parallel with some 72 € [Mə], i.e. interleaving transitions and 
pairing return values, and then rewriting the resulting trace 7 with concrete 
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and abstract rules. By the inductive hypothesis, [M,]' 2 [M;]. So 7; € MJ", 
meaning that 7; is the result of rewriting some 7! € [M;] with abstract rules. 


To warm up, we first address the case where r? 4°; 7; and rh = T2. We would 
hope, naively, that we can compose T{ with 74 to obtain some 7’ € [M || M2] 


such that 7’ 2, 7, and thus 7’ rewrites to 7. However, they do not compose 
because T; has two local message, and 74 has only the one environment mes- 
sage that matches the result of “absorbing” the two messages. Rather, ri can 
compose with a trace 72 which is equal to 74 except for having the required two 
environment messages instead of the combined one. 

We formalize this by introducing a dual auxiliary rewrite rule x for each 
abstract rule x. For example, the dual of absorb is expel, which splits up an en- 
vironment message dually to how absorb combines local messages. The auxiliary 
rewrite rules keep us within the basic denotations: 


Lemma 5. If r € |M] and T 4 a for some auziliary rule z, then x € [M]. 


Then we apply T, * 7 € [Mj], and obtain the required r’ by composing 7! 
in parallel with 72. This process of applying the dual rewrite in order to percolate 
an abstract rewrite out holds for sequential composition too. We summarize: 


Lemma 6. If x’ Ž 7 for some abstract x, and n composes in parallel with o to 
obtain T, then there exist o' Ž o and T' +4 T, such that n’ composes in parallel 
with o' to obtain T'. Similarly for sequential composition. 


In the case where there are more abstract rewrite rules needed to obtain 7, 
from 7], we can repeat the process. Yet two problems remain. 

The first problem is that 7 is obtained from r’ € |M: || M2] by both concrete 
and abstract rewrites, starting with the abstract rewrites that we have “peeled 
off” 7,. To show that r € [M, || M2]', we need the concrete rewrites to come 
before the abstract rewrites. 

The second problem appears once we remove our simplifying assumption 
that 75 = T2. In the general case, we obtain 72 from 74 using abstract rewrites 
followed by auxiliary rewrites. If we could replace the sequence of rewrites with 
one in which the abstract rewrites follow the auxiliary rewrites, then 74 could be 
rewritten with auxiliary rules to some 74 € [M2] by using which in 


turn could be rewritten with abstract rewrites to 72 € [M2]'. This would allow 
the proof to continue by repeating the process to the other side. 
Both problems are solved by commuting the abstract rewrites outwards: 


Lemma 7. For any rewrite sequence starting with T and ending with a, there 
exists one in which all of the abstract rewrites appear last. 


Thus, we can do as we planned and repeat the process to the other side, 
“peeling off” the abstract rewrites from 72 to obtain 73 € [M2], a eae i with 


he dual auxiliary rules in lockstep, resulting in some 7; € [M1] by Lemma 5 By 
these compose in parallel to some 7 € [M, || Mo] that rewrites with 
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concrete and abstract rules to 7, and thus to m. By Lemma 7 we can rewrite 
T with concrete rules to some 7 € [M, || Mo] first, and with abstract rules 


afterwards, obtaining  € [M, || Ma]. 


Having established Lemma 4, the rest is relatively straightforward. First, 
traces in basic denotations correspond to interrupted executions, and in partic- 
ular, an analog of holds for basic denotations: 


Lemma 8. For every program M and a\(p, p)|w..r € [M] there is an observable 
behavior of M with initial state (a, u) and return value matching r. 


Next, it is clear from their definition that the abstract rules do not change the 
number of transitions. Thus, thanks to the single-transition traces in 
|M] are the result of rewriting single-transition traces in |M] by abstract rules, 
which correspond to observable behaviors of M yee 

Lemma 3 follows from the fact that the abstract rules preserve the corre- 
spondence between traces and observable behavior of programs. For example, 
due to absorb there is a trace which only adds one message in the denotation of a 
program that adds two messages; yet the initial view, the opening memory, and 
the returned value are maintained. The tighten rule similarly preserves these. In 
both cases, the execution exhibiting the behavior can remain unchanged. The 
dilute rule may replace an initial message’s timestamp with a smaller one, in 
which case the execution exhibiting the behavior needs to use the new times- 
tamp accordingly, but otherwise remains the same. 


Adequacy. The central result is (directional) adequacy, stating that denotational 
approximation corresponds to refinement of observable behaviors: 


Theorem 9. If [M] C |N], then for all program contexts =|—], every observ- 
able behavior of E [M] is an observable behavior of = [N]. 


In particular, [M] C [N] implies that N —> M is valid under RA, because the 
effect of applying it is unobservable. 

Adequacy follows immediately from the above results. Indeed, using sound- 
ness, an observable behavior of = [|M] corresponds to a single-transition T € 
[= [M]]; by the assumption and compositionality r € [E [N]]; and using the 
other direction, T corresponds to an observable behavior of = [N]. 


Higher-order subtleties. When applying the above results in the presence of 
higher order, one must pay attention to the program assumption. Indeed, suppose 
[M] 2 [M']. Compositionality does not entail that [Aa. M] 2 [Aa. M’]. Indeed, 
a function Aa. M is a value, i.e. it does not execute, and in particular it does not 
perform any effects, regardless of M. Accordingly, [Aa.M] consists of closures 
of traces of the form K[{u, Wy] 6 .. f, where f is a function that returns sets of 
traces obtained from [M]. The fact that [M] 2 [M’] is not helpful, because 
traces in [Aa. M’] have different returned values f’ from traces in Aa. M]. 

Directional compositionality is still useful in the presence of abstractions. For 
example, if M is a program that returns a location, then from [a:= v ; a := w] 2 
[a := w] it follows that [(Aqa.a:=v3a:=w) M] 2 [(a.a:=w) M]. 
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Fig. 10. A selective list of supported non-structural transformations. Along with 
Symmetry, the denotational semantics supports all symmetric-monoidal laws 
with the binary operator (||) and the unit (). Similar transformations, replacing 
FAA with other RMWs, are supported too. The abstract rewrite rules used to 
validate a transformation is mentioned, if there is one. 


To deal with the need to prove properties “pointwise” that abstractions bring 
about, such as containment of denotations in the proof of directional composi- 
tionality, we use logical relations. ouh, toolkit provides a standard way to 
define these, thereby lifting properties to their higher-order counterparts. 


Transformations exhibiting abstraction. To the best of our knowledge, all trans- 
formations N —> M proven to be valid under RA in the existing literature are 
supported by our denotational semantics, i.e. |N] 2 [M]. Structural transforma- 
tions are supported by virtue of using Moggi’s standard semantics. Our seman- 
tics also validates “algebraic laws of parallel programming”, such as sequencing 
M || N —> (M, N) and its generalization that Hoare and van Staden bo recog- 
nized, (Mı ; M2) || (Ni ; No) > (M || N1) ; (Mə || N2), which in the functional 
setting can take the more expressive form in which the values returned are passed 
on to the following computation. See for a partial list. 

Hence we claim that our oe A semantics is sufficiently ab- 


stract. This supports the case that Moggils semantic toolkit can successfully 
scale to handle the intricacies of RA concurrency by adapting Brookes traces. 
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5 Related Work and Concluding Remarks 


Our work follows the approach of Brookes NE, -and its extension to higher-order 
functions using monads by Benton et al. i6]. developed a denotational 
semantics for shared memory concurrency under standard sequentially consis- 
tency b3, and established full abstraction w.r.t. a language that has a global 
atomic await instruction that locks the entire memory. The con s behind this 
approach had been used in multiple related developments, e.g. ET bal fa We 
hope that our work that targets RA will pave the way for fa Snes 

Jagadeesan et al. adapted ae semantics to the x86-TSO memory 
model kol. They showed that for x86-TSO it suffices to include the final store 
buffer at the end of the trace and add two additional simple closure rules that 
emulate non-deterministic propagation of writes from store buffers to memory, 
and identify observably equivalent store buffers. The x86-TSO model, however, 
is much closer to sequential consistency than RA, which we study in this pa- 
per. In particular, unlike RA, x86-TSO is “multi-copy-atomic” (writes by one 
thread are made globally visible to all other threads at the same time) and 
successful RMW operations are immediately globally visible. Additionally, the 
parallel composition construct in Jagadeesan et al. is rather strong: threads 
are forked and joined only when the store buffers are empty. Being non-multi- 
copy-atomic, RA requires a more delicate notion of traces and closure rules, but 
it has more natural meta-theoretic properties, which one would expect from a 
programming language concurrency, model: sequencing, a.k.a. thread-inlining, is 
unsound under x86-TSO [see bs, 31) but sound under RA (see TSE 

Burckhardt et al. developed a denotational semantics for hardware weak 
memory models (including x86-TSO) following an alternative approach. They 
represent sequential code blocks by sequences of operations that the code per- 
forms, and close them under certain rewrite rules (reorderings and eliminations) 
that characterize the memory model. This approach does not validates impor- 
tant optimizations, such as Read-Read Elimination. Moreover, unlike x86-TSO, 
RA cannot be characterized by rewrite operations on SC traces Bi). 

Dodds et al. [19] developed a fully abstract denotational semantics for RA, 
extended with fences and non-atomic accesses. Their semantics is based on 
RA’s declarative (a.k.a. axiomatic) formulation as acyclicity criteria on execution 
graphs. Roughly speaking, their denotation of code blocks (that they assume to 
be sequential) quantifies over all possible context execution graphs and calculates 
for each context the “happens-before” relation between context actions that is 
induced by the block. They further use a finite approximation of these histories 
to atomically validate refinement in a model checker. While we target RA 
well, there are two crucial differences between our work and Dodds et al. 
First, we employ Brookes-style totally ordered traces and use interleaving-based 
operational presentation of RA. Second, and more importantly, we strive for a 
compositional semantics where denotations of compound programs are defined 
as functions of denotations of their constituents, which is not the case for Dodds 
et al. . Their model can nonetheless validate transformations by checking 
them locally without access to the full program. 
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Others present non-compositional techniques and tools to check refinement 
under weak memory models between whole-thread sequential programs that ap- 
ply for any concurrent context. Poetzl and Kroening considered the SC-for- 
DRF model, using locks to avoid races. Their approach matches source to target 
by checking that they perform the same state transitions from lock to subsequent 
unlock operations and that the source does not allow more data-races. Morisset 
et al. and Chakraborty and Vafeiadis id addressed this problem for the 
C/C++11 model, of which RA is a central fragment, by implementing match- 
ing algorithms between source and target that validate that all transformations 
between them have been independently proven to be safe under C/C++11. 


Cho et al. introduced a specialized semantics for sequential programs that 
can be used for justifying compiler optimizations under weak memory concur- 
rency. They showed that behavior refinement under their sequential semantics 
implies refinement under any (sequential or parallel) context in the Promising 
Semantics 2.1 aT. Their work focuses on optimizations of race-free accesses 
that are similar to C11’s “non-atomics” W, . It cannot be used to establish 
the soundness of program transformations that we study in this paper. Adding 
non-atomics to our model is an important future work. 


Denotational approaches were developed for models much weaker than RA is, 
ba, bd ba that allow the infamous Read-Write Reorder and thus, for a 
high-level programming language, require addressing the challenge of detecting 
semantic dependencies between instructions [8]. These approaches are based on 
summarizing multiple partial orders between actions that may arise when a given 
program is executed under some context. In contrast, we use totally ordered 
traces by relating to RA’s interleaving operational semantics. In particular, Ka- 
vanagh and Brookes (28} use partial orders, Castellan, Paviotti et al. |15, 1) use 
event structures, and Jagadeesan et al., Jeffrey et al. $ employ “Pomsets 
with Preconditions” which trades compositionality for supporting non-multi- 
copy-atomicity, as in RA. These approaches do not validate certain access elim- 
inations, nor Irrelevant Load Introduction, which our model validates. 


An exciting aspect of our work is the connection between memory model 
to Moggils monadic approach. For SC, Abadi and Plotkin, Dvir et al. 


have made an even stronger connection via algebraic theories [42]. These allow 
to modularly combine shared memory concurrency with other computational 
effects. Birkedal et al. develop semantics for a type-and-effect system for SC 
memory which they use to enhance compiler optimizations based on assumptions 
on the context that come from the type system. We hope to the current work 
can serve as a basis to extend such accounts to weaker models. 
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Abstract. Software Transactional Memory (STM) is an extensively stud- 
ied paradigm that provides an easy-to-use mechanism for thread safety 
and concurrency control. With the recent advent of byte-addressable 
persistent memory, a natural question to ask is whether STM systems 
can be adapted to support failure atomicity. In this paper, we answer this 
question by showing how STM can be easily integrated with Intel’s Persis- 
tent Memory Development Kit (PMDK) transactional library (which we 
refer to as TXPMDK) to obtain STM systems that are both concurrent 
and persistent. We demonstrate this approach using known STM systems, 
TML and NOREc, which when combined with TxPMDK result in per- 
sistent STM systems, referred to as PMDK-TML and PMDK-NOREc, 
respectively. However, it turns out that existing correctness criteria are 
insufficient for specifying the behaviour of TxPMDK and our concurrent 
extensions. We therefore develop a new correctness criterion, dynamic 
durable opacity, that extends the previously defined notion of durable opac- 
ity with dynamic memory allocation. We provide a model of TxPMDk, 
then show that this model satisfies dynamic durable opacity. Moreover, 
dynamic durable opacity supports concurrent transactions, thus we also 
use it to show correctness of both PMDK-TML and PMDK-NOREc. 


1 Introduction 


Persistent memory technologies (aka non-volatile memory, NVM) such as Memory- 
Semantic SSD [53] and XL-FLASH [13], combine the durability of hard drives with 
the fast and fine-grained accesses of DRAMs, with the potential to radically change 
how we build fault-tolerant systems. However, NVM also raises fundamental 
questions about semantics and the applicability of standard programming models. 
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1 struct loc { 

2  pmem::obj::p<int> value; 

3 pmem::obj::persistent_ptr<loc> next; }; 

4 

5 struct root { pmem::obj::persistent_ptr<loc> head = nullptr; }; 
6 

7 void post_crash(...) { 

8 auto pop = pmem::obj::pool<root>::open("file",...); 
9 auto root = pop.root(); 

10 pmem::obj::transaction::run(pop, [&]{ 

11 auto xvalue = root->head->value; 

12 PDE 

13 

14 int main(...) { 

15 auto pop = pmem::obj::pool<root>::open("file",...); 


16 auto root = pop.root(); 
17 pmem::obj::transaction::run(pop, [&]{ 


18 auto x = pmem::obj::make_persistent<loc>() ; 
19 x->value = 42; 

20 x->next = nullptr; 

21 root->head = x; 

22 DE, 


Fig. 1: C++ snippet for allocating in persistent memory using TXPMDK [54] 


Among the most widely used collections of libraries for persistent programming 
is Intel’s Persistent Memory Development Kit (PMDK), which was first released 
in 2015 [30]. One important component of PMDK is its transactional library, 
which we refer to as TXPMDK, and which supports generic fazlure-atomic 
programming. A programmer can use TXPMDK to protect against full system 
crashes by starting a transaction, performing transactional reads and writes, then 
committing the transaction. If a crash occurs during a transaction, but before 
the commit, then upon recovery, any writes performed by the transaction will be 
rolled back. If a crash occurs during the commit, the transaction will either be 
rolled back or be committed successfully, depending on how much of the commit 
operation has been executed. If a crash occurs after committing, the effect of the 
transaction is guaranteed to persist. 


Most software transactional memory (STM) algorithms leave memory alloca- 
tion implicit, since they are generally safe under standard allocation techniques 
(e.g. malloc). Memory that is allocated as part of a transaction can be deallocated 
if the transaction is aborted. However, in the context of persistency, memory 
allocation is more subtle since transactions may be interrupted by a crash. 


For example, consider the program in Fig. 1. Persistent memory is allocated, 
accessed and maintained via memory pools [54] (files that are memory mapped 
into the process address space) of a certain type (e.g. of type loc in Fig. 1). Due 
to address space layout randomization (ASLR) in most operating systems, the 
location of the pool can differ between executions and across crashes. As such, 
every pool has a root object from which all other objects in the pool can be 
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found. That is, to avoid memory leaks, all objects in the pool must be reachable 
from the root. An application locates the root object using a pool object pointer 
(POP) that is to be created with every program invocation (e.g. line 15). After 
locating the pool root (line 16), we use a TXPMDK transaction (lines 17-22) to 
allocate a persistent loc object x (line 18) with value 42 (line 19) and add it to 
the pool (line 21). 

Consider the scenario where the execution of this transaction crashes. After 
recovery from the crash, we then execute post_crash (line 7). As before, we 
open the pool (line 8) and locate its root (line 9). We then use a TXxPMDK 
transaction to read from the loc object allocated and added at the pool head 
prior to the crash (line 11). There are then three cases to consider: the crash may 
have occurred (1) before the transaction started the commit process, (2) after 
the transaction successfully committed, or (3) while the transaction was in the 
process of committing. 

In case (1), the execution of the two transactions can be depicted as follows, 
where the PBegin events capture commencing the transactions (lines 17 and 10), 
PAlloc(x) denotes the persistent allocation of x (line 18); PWrite (x->value, 42) 
captures writing to x (line 19); and PRead(root->head):x denotes reading 
from x->value and returning the value x (first part of line 11). As the first 
transaction never reached the commit stage, its effects (i.e. allocating x and 
writing to it) should be invisible (i.e. rolled back), and thus the read of the second 
transaction effectively reads from unallocated memory, leading to an error such 
as a segmentation fault. 


PWrite PWrite PRead 
PBegin PAlloc(x) (x->next,...) (root->head,x) (root->head) :x SegFault 
SS a M M M FF Ot > 
PWrite PBegin PRead 
(x->value,42) (x->value) 


In case (2), the execution of the transactions is as follows, where the PCommit 
events capture the end (successful commit) of the transactions (lines 22 and 12), 
the effects of the first transaction fully persist upon successful commit, and thus 
the read in the second transaction does not fault. 


PRead PRead 
PBegin PAlloc(x) _,, PCommit PBegin (root->head):x (x->value):42 PCommit 
———— a re st 


Finally, in case (3), either of the two behaviours depicted above is possible (i.e. 
the second transaction may either cause a segmentation fault or read from x). 

Efficient and correct memory allocation in a persistent memory setting is 
challenging ([54, Chapter 16] and [55]). In addition to the ASLR issue mentioned 
above, the allocator must guarantee failure atomicity of heap operations on 
several internal data structures managed by PMDK. Therefore, PMDK provides 
its own allocator that is designed specifically to work with TXPMDK. 

We identify two key drawbacks of TxPMDK as follows. In this paper, we 
take steps towards addressing both of these drawbacks. 


A) Lack of concurrency support. Unlike existing STM systems in the persis- 
tent setting [39,32] that provide both failure atomicity (ensuring that a transaction 
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either commits fully or not at all in case of a crash) and isolation (as defined 
by ACID properties, ensuring that the effects of incomplete transactions are 
invisible to concurrently executing transactions), T<xPMDK only provides fail- 
ure atomicity and does not offer isolation in concurrent settings. In particular, 
naively implemented applications with racy PMDK transactions lead to memory 
inconsistencies. This is against the spirit of STM: the primary function of STM 
systems is providing a concurrency control mechanism that ensures isolation. 
The current TXPMDK implementation provides two solutions: threads either 
execute concurrent transactions over disjoint parts of the memory [54, Chapter 
7|, or use user-defined fine-grained locks within a transaction to ensure memory 
isolation [54, Chapter 14]. However, both solutions are sub-optimal: the former 
enforces serial execution when transactions operate over the same part of the 
memory, and the latter expects too much of the user. 


B) Lack of a suitable correctness criterion. There is no formal specification 
describing the desired behaviour of TXxPMDK, and hence no rigorous descrip- 
tion or correctness proof of its implementation. This undermines the utility of 
TXPMDK in safety-critical settings and makes it impossible to develop formally 
verified applications that use TXPMDK. Indeed, there is currently no correctness 
criterion for STM systems that provide dynamic memory allocation (a large 
category that includes all realistic implementations). 


1.1 Concurrency for TxXPMDK 


Integrating concurrency with PMDK transactions is an important end goal for 
PMDK developers. The existing approach requires integration of locks with 
TXPMDK, which introduces overhead for programmers. Our paper shows that 
STM and PMDK can be easily combined, improving programmability. Many 
other works have aimed to develop failure-atomic and concurrent transactions 
(e.g. OneFile [52] and Romulus [16]), but none use off-the-shelf commercially 
available libraries. Moreover, these other works have not addressed correctness 
with the level of rigour that our paper does. In other work, popular key-value 
stores Memcached and Redis have been ported to use PMDK [36,37]; our work 
paves the way for concurrent version of these applications to be developed. 
Another example is the work of Chajed et al [11], who provide a simulation-based 
technique for verifying refinement of durable filesystems, where concurrency is 
handled by durable transactions. 

We tackle the first drawback (A) mentioned above by developing, specifying, 
and validating two thread-safe versions of TXPMDK. 


Contribution A: Making TXPMDK thread-safe. We combine TXPMDK 
with two off-the-shelf (thread-safe) STM systems, TML [17] and NOREc [18], 
to obtain two new implementations, PMDK-TML and PMDK-NOREG, that 
support concurrent failure-atomic transactions with dynamic memory allocation. 
In particular, we reuse the existing concurrency control mechanisms provided by 
these STM systems to ensure atomicity of write-backs, thus obtaining memory 
isolation even in a multi-threaded setting. We show that it is possible to integrate 


154 Azalea Raad, Ori Lahav, John Wickerson, Piotr Balcer, and Brijesh Dongol 


ity [27 ( contribution this pap 
opacity 127] dynamic durable opacity 


1 atenda DDOPACITY r ; : 
ı extends Ce B2) declarative specification 


y : (3) ~ 
durable opacity [6] } - - Ieee > $4 T = operational specification 
> 


refines (Thm. 2) executable abstraction 


refines 


= contribution B4 


extends 


joai --------- + (Contribution B3) 
§5.1 refines (FDR4) refines (FDR4) 
refines (FDR4)_ _\-----. gä 
. f Es ~- _uses 
heed PMDK +" (PMDK-TML ) ~{PMDK-NOR 
a nlamantation modelled by ! ses T i 7 ~~ 
pmnplementation a E eee 1> (Contribution B1) =< (Contribution A) (Contribution A) 
| of PMDK [30] | §2.3 §3 §3 


Fig. 2: The contributions of this paper and their relationships to prior work 


these mechanisms with TxXPMDK to additionally achieve failure atomicity. Our 
approach is modular, with a clear separation of concerns between the isolation 
required due to concurrency and the atomicity required due to the possibility 
of system crashes. This shows that concurrency and failure atomicity are two 
orthogonal concerns, highlighting a pathway towards a mix-and-match approach 
to combining (concurrent) STM and failure-atomic transactions. Finally, in order 
to provide the same interface as PMDK, we extend both TML and NOREC with 
an explicit operation for memory allocation. 


1.2 Specification and Validation 


To tackle drawback (B) above, we make four contributions. Together, they provide 
the first formal (and rigorous) specification of TXPMDK and validation of its 
implementation. 


Contribution B1: A model of TXPMDK. We provide a formal specification 
of TXPMDK as an abstract transition system. Our formal specification models 
almost all key components of TXxPMDK (including its redo and undo logs, as well 
as the interaction of these components with system crashes), with the exception 
of memory deallocation within TxPMDK transactions. 


Contribution B2: A correctness criterion for transactions with dy- 
namic allocation. Although the literature includes several correctness criterion 
for transactional memory (TM), none can adequately capture TxPMDK in that 
they do not account for dynamic memory allocation. We develop a new correct- 
ness condition, dynamic durable opacity (denoted DDOPACITY), by extending 
durable opacity [6] to account for dynamic allocation. DDOPACITY supports not 
only sequential transactions such as TXPMDK, but also concurrent ones. To 
demonstrate the suitability of DDOPACcITY for concurrent and persistent (durable) 
transactions, later we validate our two concurrent TXxPMDK implementations 
(PMDK-NOREc and PMDK-TML) against DDOPACITY. 


Contribution B3: An operational characterisation of our correctness 
criterion. Our aim is to show that TXPMDK conforms to DDOPACITY, or 
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more precisely, that our model of T<xPMDK refines our model of DDOPACITY. 
To demonstrate this, we use a new intermediate model called DDTMS. While 
DDOPACITY is defined declaratively, DDTMS is defined operationally, which 
makes it conceptually closer to our model of the TXxPMDK implementation. We 
prove that DDTMS is a sound model of DDOPACITY (i.e. every trace of DDTMS 
satisfies DDOPACITY). 


Contribution B4: Validation of TXPMDK, PMDK-TML and PMDK- 
NOREC in FDR4. We mechanise our implementations (TxPMDK, PMDK- 
TML and PMDK-NOREc) and specification (DDTMS) using the CSP modelling 
language. We use the FDR4 model checker [26] to show the implementations are 
refinements of DDTMS over both the persistent SC (PSC) [31] and persistent 
TSO (Px86sim) [50] memory models. For Px86sim, we use an equivalent formula- 
tion called PTSOsyn developed by Khyzha and Lahav [31]. The proof itself is 
fully automatic, requiring no user input outside of the encodings of the models 
themselves. Additionally, we develop a sequential lower bound (DDTMS-Seq), 
derived from DDTMS, and show that this lower bound refines TXxPMDK (and 
hence that TxPMDK is not vacuously strong). Our approach is based on an 
earlier technique for proving durable opacity [23], but incorporates much more 
sophisticated examples and memory models. 


Outline. Fig. 2 gives an overview of the different components that we have 
developed in this paper and their relationships to each other and to prior work. 
We structure our paper by presenting the components of Fig. 2 roughly from the 
bottom up. In §2, we present the abstract TXPMDK model, and in §3 we describe 
its integration with STM to provide concurrency support via PMDK-TML and 
PMDK-NOREc. In 84 we present DDOPACITY, in 85 we present DDTMS, and 
in 86 we describe our FDR4 encodings and bounded proofs of refinement. 


Additional Material. We provide our FDR4 development as supplementary 
material [47]. The proofs of all theorems are given in an extended version [46]. 


2 Intel PMDK transactions 


We describe the abstract interface TxPMDK provides to clients (§2.1), our 
assumptions about the memory model over which TXxPMDK is run (§2.2) and 
the operations of TXPMDK (82.3). We present our PMDK abstraction in §2.3. 


2.1 PMDK Interface 


PMDK provides an extensive suite of libraries for simplifying persistent pro- 
gramming. The PMDK transactional library (TX PMDK) has been designed to 
support failure-atomicity by providing operations for tracking memory locations 
that are to be made persistent, as well allocating and accessing (reading and 
writing) persistent memory within an atomic block. 

In Fig. 3 we present an example client code that uses TXxPMDK. The code (due 
to [54, p. 131]) implements the push operation for a persistent linked-list queue. 


156 Azalea Raad, Ori Lahav, John Wickerson, Piotr Balcer, and Brijesh Dongol 


struct queue_node { 
pmem: :obj::p<int> value; 
pmem: : obj: :persistent_ptr<queue_node> next; }; 


pmem: :obj: :persistent_ptr<queue_node> head 
pmem: :obj: :persistent_ptr<queue_node> tail 


nullptr; 
nullptr; }; 


1 
2 
3 
4 
5 struct queue { private: 
6 
7 
8 
9 


void push(pmem::obj::pool_base &pmem_op, int value) { 


10 pmem::obj::transaction::run(pmem_op, [&]{ 

11 auto node = pmem: :obj::make_persistent<queue_node>(); 
12 node->value = value; 

13 node->next = nullptr; 

14 if (head == nullptr) { 

15 head = tail = node; 

16 } else { 

17 tail->next = node; 

18 tail = node; } 

19 Ie J 


Fig. 3: C++ persistent push operation using TXPMDK ([54, p. 131]) 


The implementation wraps a typical (non-persistent) push operation within a 
transaction using a C++ lambda [&] expression (line 10). The transaction is 
invoked using transaction: :run, which operates over the memory pool pmem_op. 
The node structure (lines 2 and 3), the queue structure (lines 6 and 7), and 
any new node declaration (line 11) are to be tracked by a PMDK transaction. 
Additionally, the push operation takes as input the persistent memory object pool, 
pmem_op, which is a memory pool on which the transaction is to be executed. This 
argument is needed because the application memory may map files from different 
file systems. On line 7 we use make_persistent to perform a transactional 
allocation on persistent memory that is linked to the object pool pmem_op (see 
[54] for details). The remainder of the operation (lines 12-18) corresponds to 
an implementation of a standard push operation with (transactional) reads and 
writes on the indicated locations. At line 19, the C++ lambda and the transaction 
is closed, signalling that the transaction should be committed. 

If the system crashes while push is executing, but before line 19 is executed, 
then upon recovery, the entire push operation will be rolled back so that the 
effect of the incomplete operation is not observed, and the queue remains a 
valid linked list. After line 19, the corresponding transaction executes a commit 
operation. If the system crashes during commit, depending on how much of the 
commit operation has been executed, the push operation will either be rolled 
back, or committed successfully. Note that roll-back in all cases ensures that the 
allocation at line 11 is undone. 


2.2 Memory Models 


We consider the execution of our implementations over two different memory 
models: PSC and PTSOsyn [31]. Both models include a flush x instruction 
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to persist the contents of the given location x to memory. PTSOsyn aims for 
fidelity to the Intel x86 architecture. In a race-free setting (as is the case for 
single-threaded TXPMDK transactions) it is sound to use the simpler PSC 
model, though we conduct all of our experiments in both models. 

PSC is a simple model that considers persistency effects and their interaction 
with sequential consistency. Writes are propagated directly to per-location persis- 
tence buffers, and are subsequently flushed to non-volatile memory, either due 
to a system action, or the execution of a flush instruction. A read from x first 
attempts to fetch its value from the persistence buffer and if this fails, fetches its 
value from non-volatile memory. 

Under Intel-x86, the memory models are further complicated by the interaction 
between total store ordering (TSO) effects [40] and persistency. Due to the abstract 
nature of our models (see Fig. 4) it is sufficient for us to focus on the simpler 
Px86sim model [50] since we do not use any of the advanced features [48,49,50]. We 
introduce a further simplification via PTSOsyn that is observationally equivalent 
to Px86sim [31]. Unlike Px86sim, which uses a single (global) persistence buffer, 
PTSO,,n uses per-location buffers simplifying the resulting FDR4 models (§6). 

In PTSOgyn, writes are propagated from the store buffer in FIFO order to a 
per-location FIFO persistency buffer. Writes in the persistency buffer are later 
persisted to the non-volatile memory. A read from location x first attempts to 
fetch the latest write to x from the store buffer. If this fails (i.e. no writes to x 
exists in the store buffer), it attempts to fetch the latest write from the persistence 
buffer of x, and if this fails, it fetches the value of x from non-volatile memory. 


2.3 PMDK Implementation 


We present the pseudo-code of our TXPMDK abstraction in Fig. 4. We model 
all features of TXPMDK (including its redo and and undo logs as well as its 
recovery mechanism in case of a crash) except memory deallocation within a 
TXPMDK transaction. We use mem to model the memory, mapping each location 
(in loc) to a value-metadata pair. We model a value (in val) as an integers, and 
metadata as a boolean indicating whether the location is allocated. As we see 
below, the list of free (unallocated) locations, freeList, is calculated during 
recovery using metadata. 

Each PMDK transaction maintains redo logs and an undo log. The redo logs 
record the locations allocated by the transaction so that if a crash occurs while 
committing, the allocated locations can be reallocated, allowing the transaction 
to commit upon recovery. Specifically, T<XPMDK uses two distinct redo logs: 
tRedo and pRedo. Both are associated with fields undoValid (which is unset 
when the log is invalidated), checksum (used to indicate whether the log is valid), 
and allocs (which contains the set of locations allocated by the transaction). 
Note that TXPMDK explicitly sets and unsets undoValid, whereas checksum 
is calculated (e.g. at line 36) and may be invalidated by crashes corrupting a 
partially completed write. The undo log records the original (overwritten) value 
of each location written to by the transaction, and is consulted if the transaction 
is to be rolled back. We model it as a map from locations to values (of type int). 
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1 // Each location is persistent; there is no explicitly volatile memory. 

2 mem : loc -> { 

3 val : int; // the contents of this location 

4 metadata : bool; } // false = not allocated, true = allocated 

5 freeList : loc list // transient list of free locations 

6 

7 // Redo logs -- tRedo is transient; pRedo is persistent. 

8 tRedo;, pRedo; : {undoValid:bool; checksum:int; allocs:loc set;} 

9 undo; : loc -> int // undo log recording the original val of each loc 

10 undoValid : bool // undoValid global flag, initially true 

11 PBegin; ê 42 apply_pRedo; & 

12 tRedo; := (true, -1, {}) 43 foreach x € pRedo;.allocs: 

13 pRedo; := (true, -1, {}) 44 mem[x].metadata := true 

14 undo; := {} 45 flush mem[x] .metadata 

15 undoValid; := true 46 if —pRedo;.undoValid then 

16 47 undoValid; := false 

17 PAlloc; ê 48 flush undoValid; 

18 x; := freeList.take 49 

19 tRedo;.allocs := 50 persist_writes; & 

20 tRedo;.allocs U {x;} 51 foreach x € dom(undo;): flush x 

21 return xX; 52 

22 53 roll_back, £ 

23 PRead;(x) = 54 foreach (x |> v) € undo;: 

24 return mem[x].val 55 mem[x].val := v 

25 56 persist_writes; 

26 PWrite;(x,v) = 57 

27 if x ¢ dom(undo;) then 58 PAbort; £ 

28 we := mem[x].val 59 roll_back; 

29 undo; := undo; U {x > w} 60 undoValid; := false 

30 flush undo; 61 flush undoValid; 

31 mem[x].val := v 62 foreach x € tRedo;.allocs: 

32 63 freeList.add(x) 

33 PCommit, £ 64 

34 persist_writes; 65 PRecovery; = 

35 tRedo;.undoValid := false 66 if calc_checksum(pRedo;) 

36 tRedo;.checksum := 67 = pRedo;.checksum 
calc_checksum(tRedo;) 68 then apply_pRedo; 

37 pRedo; := tRedo; 69 if undoValid; then 

38 flush pRedo; 70 roll_back; 

39 apply_pRedo; 71 foreach x € dom(mem): 

40 pRedo;.checksum := -1 72 if —mem[x] .metadata then 

41 flush pRedo;.checksum 73 freeList.add(x) 


A separate variable undoValid (distinct from undoValid in tRedo and pRedo) 


Fig. 4: PMDK global variables and pseudo-code 


is used to determine whether this undo log is valid. 


Each component in Fig. 4 have both a volatile and persistent copy, although 
some components, e.g. tRedo and freeList, are transient, i.e. their persistent 
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versions are never used. Likewise, the persistent redo log, pRedo, is only used in 
a persistent fashion and its volatile copy is never used. 

We now describe the operations in Fig. 4. We assume the operations are 
executed by a transaction with id t. This id is not useful in the sequential 
setting in which TXxPMDK is used; however, in our concurrent extension (§3) 
the transaction id is critical. 


PBegin. The begin operation simply sets all local variables to their initial values. 


PAlloc. Allocation chooses and removes a free location, say x, from the free 
list, adds x to the transient redo log (line 20) and returns x. Removing x from 
freeList ensures it is not allocated twice, while the transient redo log is used 
together with the persistent redo log to ensure allocated locations are properly 
reallocated upon a system crash. 

When the transaction commits, the transient redo log is copied to the persistent 
one (line 37), and the effect of the persistent log is applied at line 39 via 
apply_pRedo. (Note that apply_pRedo is also called by PRecovery on line 68.) 
The behaviour of this call depends on how much of the in-flight transaction was 
executed before the crash leading to the recovery. If a crash occurred after the 
transaction executed (line 37) and the corresponding write persisted (either due 
to a system flush or the execution of line 38), then executing apply_pRedo via 
PRecovery has the same effect as the executing line 39, i.e. the effect of the redo 
log will be applied. This (persistently) sets the metadata field of each location in 
the redo log to indicate that it is allocated (lines 43-45), and then invalidates 
the undo log (lines 46-48) so that the transaction is not rolled back. 


PRead. A read from x simply returns its in-memory value (line 24). Note that 
location x may not be allocated; TxPMDK delegates the responsibility of checking 
whether it is allocated to the client. 


PWrite. A write to x first checks (line 27) if the current transaction has already 
written to x (via a previously executed PWrite). If not, it logs the current value 
by reading the in-memory value of x (line 28) and records it in the undo log 
(line 29). The updated undo log is then made persistent (line 30). Once the 
current value of x is backed up in the undo log (either by the current write or by 
the previous write to x), the value of x in memory is updated to the new value v 
(line 31). As with the read, location x may not have been allocated; TxPMDK 
delegates this check to the client. 


PCommit. The main idea behind the commit operation is to ensure all writes 
are persisted, and that the persistent redo and undo logs are cleared in the correct 
order, as follows. (1) On line 34 all writes written by the transaction are persisted. 
(2) Next, the transient redo log is invalidated (line 35) and the checksum for 
the log is calculated (line 36). This updated transient log is then set to be the 
persistent redo log (line 37), which is then made persistent (line 38). Note that 
after executing line 38, we can be assured that the transaction has committed; if 
a crash occurs after this point, the recovery will redo and persist the allocation 
and the undo log will be cleared. (3) The operation then calls apply_pRedo at 
line 39, which makes the allocation persistent and clears the undo log. (4) Finally, 
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at line 40, the pRedo checksum is invalidated since apply_pRedo has already 
been executed. If a crash occurs after line 40 has been executed, then the recovery 
checks at line 67 and line 69 will fail, i.e. recovery will calculate the free list. 


PAbort. A PMDK transaction is aborted by a PRead/PWrite that attempts to 
access (read/write) an unallocated location. When a transaction is aborted, all 
of its observable effects must be rolled back. First, the memory effects are rolled 
back (line 59), then the undo log is invalidated (line 60) and made persistent 
(line 61), preventing undo from being replayed in case a crash occurs. Finally, all 
of the locations allocated by the executing transaction are freed (lines 62-63). 

Note that if a crash occurs during an abort, the effect of the abort will be replayed. 
PRecovery reconstructs the free list at lines 71-73, which effectively replays the 
loop at lines 62-63 of PAbort. Additionally, if a crash occurs before the write at 
line 60 has persisted, then the effect of undoing the operation will be explicitly 
replayed by the roll-back executed by PRecovery since undoValid holds. If the 
crash occurs after the write at line 60 has persisted, then no roll-back is necessary. 


PRecovery. The recovery operation is executed immediately after a crash, and 
before any other operation is executed. The recovery proceeds in three phases: 
(1) The checksum of the persistent redo log is recalculated (line 67) and if it 
matches the stored checksum (pRedo. checksum) the apply_pRedo operation is 
executed. As discussed, apply_pRedo sets and persists the metadata of each 
location in the redo log, and then invalidates the undo log. (2) The transaction 
is rolled back if apply_pRedo in step 1 fails; otherwise, no roll-back is performed. 
(3) The free list is reconstructed by inserting each location whose metadata is 
set to false into freeList (lines 71-73). 


Correctness and Thread Safety. As discussed in §2.1, TxPMDK is designed 
to be failure-atomic. This means that correctness criteria such as opacity [27,2] 
and TMS1/TMS2 [20] (restricted to sequential transactions) are inadequate 
since they do not accommodate crashes and recovery. This points to conditions 
such as durable opacity [6], which extends opacity with a persistency model. 
However, durable opacity (restricted to sequential transactions) is also insufficient 
since it does not define correctness of allocations and assumes totally ordered 
histories. In §4 we develop a generalisation of durable opacity, called dynamic 
durable opacity (DDOPACITY) that addresses both of these issues. As with durable 
opacity, DDOPACITY defines correctness for concurrent transactions. We develop 
concurrent extensions of PMDK transactions in §3, which we show to be correct 
against (i.e. refinements of) DDOPACITY. 

As discussed, PMDK transactions are not thread-safe; e.g. concurrent calls 
to PRead and PWrite on the same location create a data race causing PRead to 
return an undefined value (see the example in §1). We discuss techniques for 
mitigating against such races in §3. Nevertheless, some PMDK transactional 
operations are naturally thread-safe. In particular, PAlloc is designed to be 
thread-safe via an built-in arena mechanism: a memory pool split into disjoint 
arenas with each thread allocating from its own arena. Moreover, each thread 
uses locks for each arena to publish allocated memory to the shared pool [55]. 
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Init. glb = 0 


15 TxRead;(x) ê 


1 TxBegin, ê 16 ve := PRead;: (x) 

2 do loc; := glb 17 if even(loc;) then 

3 until even(loc;) 18 if glb = loc, then 
4 PBegin; 19 return vz 

5 x 20 else PAbort;; return abort 
6 TxAlloc; = 21 else return vz 

7 return PAlloc; 22 

8 23 TxCommit, £ 

9 TxWrite;(x, v) ê 24 PCommit; 

10 if even(loc;) then 25 if odd(loc;) then 

11 if ~cas(glb, loc;, loc:+1) 9 glb := loc;+1 

12 then PAbort;; return abort 27 

13 else loc:++ 


28 Recovery = 

29 foreach t € TXIpb: 
30 PRecovery; 

31 glb := 0 


14 PWrite;(x,v) 


Fig. 5: Pseudo-code for PMDK-TML with our additions made w.r.t. TML 
highlighted red 


3 Making PMDK Transactions Concurrent 


We develop two algorithms that combine two existing STM systems with PMDK. 
The first algorithm (§3) is based on TML [17], which uses pessimistic concurrency 
control via an eager write-back scheme. Writing transactions effectively take a 
lock and perform the writes in place. The second algorithm (§3) is based on 
NOREC [18], which utilises optimistic concurrency control via a lazy write-back 
scheme. In particular, transactional writes are collected in a local write set and 
written back when the transaction commits. 

It turns out that PMDK can be incorporated within both algorithms straight- 
forwardly. This is a strength of our approach and points towards a generic 
technique for extending existing STM systems with failure atomicity. Given the 
challenges of persistent allocation, we reuse PMDK’s allocation mechanisms to 
provide an explicit allocation mechanism in both our extensions [54]. 


PMDK-TML. We present the pseudo-code for PMDK-TML (combining TML 
and TXPMDK) in Fig. 5, where we highlight the calls to TxPMDK operations. 
These calls are the only changes we have made to the TML algorithm. TML is 
based on a single global counter, glb, whose value is read and stored within a 
local variable loc; when transaction t begins (TxBegin). There is an in-flight 
writing transaction iff glb is odd. TML is designed for read-heavy workloads, and 
thus allows multiple concurrent read-only transactions. A writing transaction 
causes all other concurrent transactions to abort. 

PMDK-TML proposes a modular combination of PMDK with the TML 
algorithm by nesting a PMDK transaction inside a TML transaction; i.e. each 
transaction additionally starts a PMDK transaction. All reads and writes to 
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memory are replaced by TXPMDK read and write operations. Moreover, when a 
transaction aborts or commits, the operation calls a TXPMDK abort or commit, 
respectively. Finally, PMDK-TML includes allocation and recovery operations, 
which call TXxPMDK allocation and recovery, respectively. The recovery operation 
additionally sets glb to 0. 

A read-only transaction t may call PRead, at line 16 when another transaction 
t’ is executing PWrite, at line 14 on the same location. Since TXxPMDK does 
not guarantee thread safety for these calls, the value returned by PRead; should 
not be passed back to the client. This is indeed what occurs. First, note that if 
transaction ¢ is read-only, then loc; is even. Moreover, a read-only transaction 
only returns the value returned by PRead; (line 19) if no other transaction has 
acquired the lock since t executed TxBegin,. In the scenario described above, t’ 
must have incremented glb by successfully executing the CAS at line 11 as part 
of the first write operation executed by t’, changing the value of glb. This means 
that t would abort since the test at line 18 would fail. 


PMDK-NOREc. We present PMDK-NOREc (combining NOREC and PMDk) 
in Fig. 6, where we highlight the calls to TXxPMDK. These calls are the only 
changes we have made to the NOREC algorithm. As with TML, NOREC is 
based on a single global counter, glb, whose value is read and stored within a 
transaction-local variable loc when a transaction begins (TxBegin). There is an 
in-flight writing transaction iff glb is odd. Unlike TML, NOREC performs lazy 
write-back, and hence utilises transaction-local read and write sets. A transaction 
only performs the write-back at commit time once it “acquires” the glb lock. Prior 
to write-back and read response, it ensures that the read sets are consistent using a 
per-location validate operation. We eschew details of the NOREC synchronisation 
mechanisms and refer the interested reader to the original paper [18]. 

The transformation from TXPMDK to PMDK-NOREC is similar to PMDK- 
TML. We ensure that a PMDK transaction is started when a PMDK-NOREc 
transaction begins, and that this PMDK transaction is either aborted or com- 
mitted before the PMDK-NOREC transaction completes. We introduce TxAlloc 
and Recovery operations that are identical to PMDK-TML, and replace all calls 
to read and write from memory by PRead and PWrite operations, respectively. 

As with PMDK-TML, a PRead executed by a transaction (at line 12, line 15 
or line 31) may race with a PWrite (at line 43) executed by another transaction. 
However, since PWrite operations are only executed after a transaction takes 
the glb lock (at line 40), any transaction with a racy PRead is revalidated. If 
validation fails, the associated transaction is aborted. 


4 A Declarative Correctness Criteria 


We present a declarative correctness criteria for TM implementations. Unlike 
prior definitions such as (durable) opacity, TMS1/2 etc. that are defined in terms 
of histories of invocations and responses, we define dynamic durable opacity 
(DDOPACITY) in terms of execution graphs, as is standard model for weak memory 
setting. Our models are inspired by prior work on declarative specifications for 
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Init: glb = 0 


23 TxWrite;(x,v) ê 


TxBegin; = 24 wrSet; := wrSet; U {x > v} 
do loc; := glb 25 
until even(loc;) 26 Vaa A 
PBegin, 27 while true 
A 28 time; := glb 
TxAlloc; = 29 if odd(time;) then goto 28 
return PAlloc; 30 foreach x +> v € rdSet;: 
P 31 if PRead;(x) Æ v 
TxRead;(x) = 32 then PAbort;; return abort 
if x € dom(wrSet;) then 33 if time; = glb 
return wrSet; (x) 34 then return time, 
vi := PRead; (x) 35 
while loc; # glb 36 TxCommit, £ 
loc; := Validate 37 if wrSet,.isEmpty 
vi := PRead; (x) 38 then PCommit, 
rdSet; := rdSet; U {x > vi} 39 return 
return vz 40 while acas(glb, loc:, loc; + 1) 
R 41 loc; := Validate; 
Recovery = 42 foreach x +> v € wrSet;: 
foreach t € TXIp: 43 PWrite;(x, v) 
PRecovery; 44 PCommit; 
glb := 0 45 glb := loc; + 2 
46 return 


Fig. 6: Pseudo-code for PMDK-NOREC, with our additions made w.r.t. NOREC 
highlighted red 


transactional memory, which focussed on specifying relaxed transactions [22,14]. 
However, these prior works do not describe crashes or allocation. 


Executions and Events. The traces of memory accesses generated by a 
program are commonly represented as a set of executions, where each execution 
G is a graph comprising: 1. a set of events (graph nodes); and 2. a number of 
relations on events (graph edges). Each event e corresponds to the execution of 
either a transactional event (e.g. marking the beginning of a transaction) or a 
memory access (read/write) within a transaction. 


Definition 1 (Events). An event is a tuple a = (n,7,t,l), where n € N is 
an event identifier, T € TID is a thread identifier, t € TXID is a transaction 
identifier and l € LAB is an event label. 

A label may be B to mark the beginning of a transaction; A to denote a 
transactional abort; (M,x,0) to denote a memory allocation yielding x initialised 
with value 0; (R, x, v) to denote reading value v from location x; (W,x,v) to denote 
writing v to x; C to mark the beginning of the transactional commit process; or S 
to denote a successful commit. 


The functions tid, tx and lab respectively project the thread identifier, 
transaction identifier and the label of an event. The functions loc, val, and val, 
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respectively project the location, the read value and the written value of a label, 
where applicable, and are lifted to events by defining e.g. loc(a) = loc(lab(a)). 


Notation. Given a relation r and a set A, we write r’, r+ and r* for the 
reflexive, transitive and reflexive-transitive closures of r, respectively. We write r~! 
for the inverse of r; r| a for rO (AxA); [A] for the identity relation on A, i.e. {(a, a) | 
a € A}; irreflexive(r) for fa. (a,a) €r; and acyclic(r) for irreflexive(r+). We write 
r13f for the relational composition of rı and rg, i.e. {(a, b) | 3c. (a,c) Er A (c,b) € 
r2}. When A is a set of events, we write A, for {a€ A | loc(a)=2}, and rs for r| A,. 
Analogously, we write A; for {a€ A | tx(a)=t}. The ‘same-transaction’ relation, 
st C E x E, is the equivalence relation st = {(a,b) € E x E | tx(a)=tx(b)}. 


Definition 2. An execution, G € EXEC, is a tuple (E,po, clo, rf, mo), where: 


— E isa set of events. The set of reads in E is R= {e € E|1lab(e)=(R,—,—)}. 
The sets of allocations (M), writes (W), aborts (A), transactional begins 
(B), transactional commits (C) and commit successes (S) are analogous. 

— po C E x E denotes the ‘program-order’ relation, defined as a disjoint union 
of strict total orders, each ordering the events of one thread. 

— clo C E x E denotes the ‘client-order’ relation, which is a strict partial order 
between transactions (st;clo;st C clo \ st) that extends the program order 
between transactions (po \ st C clo). 

— rf C (M U W) x R denotes the ‘reads-from’ relation between events of the 
same location with matching values; i.e. (a,b) € rf = loc(a)=loc(b) A 
val,,(a)=val,(b). Moreover, rf is total and functional on its range. 

— mo C Ex E is the ‘modification-order’, defined as the disjoint union of 
relations {mM0z}eeLoc, such that each mo, is a strict total order on MsU Wz. 


Given a relation r C E x E, we write rr for lifting r to transaction classes: 
rt = st; (r \ st);st. For instance, when (w,r) € rf, w is a transaction t event and 
r is a transaction tz event, then all events in tı are rfy-related to all events in fg. 
We write ry to restrict r to its intra-transactional edges (within a transaction): 
rr = rN st; and write rg to restrict r to its extra-transactional edges (outside 
a transaction): re =r \ st. Analogously, we write r; to restrict r to its intra- 
thread edges: ri = {(a,b) €r|tid(a)=tid()}; and write re to restrict r to its 
extra-thread edges: te=r \ ri. 

In the context of an execution G (we use the “G.” prefix to make this explicit), 
the reads-before relation is rb = (rf—+; mo). 

Lastly, we write Commit for the events of committing transactions, i.e. those 
that have reached the commit stage: Commit = dom(st;[C]). We define the sets 
of aborted events, Abort, and (commit)-successful events, Succ, analogously. We 
define the set of commit-pending events as CPend = Commit \ (Abort U Succ), and 
the set of pending events as Pend = E \ (CPend U Abort U Succ). 

Given an execution G=(E, po, clo, rf, mo), we write G|, for (EN A, poļ gna, 
clol gna, lana, mol ena). We further impose certain “well-formedness” conditions 
on executions, used to delimit transactions and restrict allocations. For example, 
we require that events of the same transaction are by the same thread and the 
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each ¢ contains exactly one begin event. In particular, these conditions ensure 
that in the context of a well-formed execution G we have 1. G.Succ C G.Commit; 
2. each ¢ contains at most a single abort or success (|G.E;, N (AU S)| < 1) and 
thus G.(SuccN Abort)=0; and 3. G.E = G.(Pend w Abort w CPend W Succ), i.e. the 
sets G.Pend, G.Abort, G.CPend and G.Succ are pair-wise disjoint. 


Execution Consistency. The definition of (well-formed) executions above 
puts very few constraints on the rf and mo relations. Such restrictions and 
thus the permitted behaviours of a transactional program are determined by 
defining the set of consistent executions, defined separately for each transactional 
consistency model. The existing literature includes several definitions of well- 
known consistency models, including serialisability (SER) [41], snapshot isolation 
(SI) [9,44] and parallel snapshot isolation (PSI) [10,43]. 


Serialisability (SER). The serialisability (SER) consistency model [41] is 
one of the most well-known transactional consistency models, as it provides strong 
guarantees that are intuitive to understand and reason about. Specifically, under 
SER, all concurrent transactions must appear to execute atomically one after 
another in a total sequential order. The existing declarative definitions of SER 
[9,10,50] are somewhat restrictive in that they only account for fully committed 
(complete) transactions, i.e. they do not support pending or aborted transac- 
tions. Under the assumption that all transactions are complete, an execution 
(E,po, clo, rf, mo) is deemed to be serialisable (i.e. SER-consistent) iff: 


— rfr U mor U rby C po (SER-INT) 
— clo U rft U mor U rbr is acyclic. (SER-EXT) 


The SER-INT axiom enforces intra-transactional consistency, ensuring that e.g. a 
transaction observes its own writes by requiring rf; C po (i.e. intra-transactional 
reads respect the program order). Analogously, the SER-EXT axiom guarantees 
extra-transactional consistency, ensuring the existence of a total sequential order in 
which all concurrent transactions appear to execute atomically one after another. 
This total order is obtained by an arbitrary extension of the (partial) ‘happens- 
before’ relation which captures synchronisation resulting from transactional 
orderings imposed by client order (clo) or conflict between transactions (rfr U 
mor U rbr). Two transactions are conflicted if they both access (read or write) 
the same location x, and at least one of these accesses is a write. As such, the 
inclusion of rf¢ U mor U rby enforces conflict-freedom of serialisable transactions. 
For instance, if transactions tı and t both write to x via events wı and ws such 
that (w1, w2) € mo, then tı must commit before tz, and thus the entire effect of 
tı must be visible to to. 


Opacity. We do not stipulate that all transactions commit successfully 
and allow for both aborted and pending transactions. As such, we opt for the 
stronger notion of transactional correctness known as opacity. In what follows we 
describe our notion of opacity over executions (formalised in Def. 3), and later 
relate it to the existing notion of opacity over histories [27] and prove that our 
characterisation of opacity is equivalent to that of the existing one (see Thm. 1). 
Further intuitions are provided in the extended version of this paper [46]. 


166 Azalea Raad, Ori Lahav, John Wickerson, Piotr Balcer, and Brijesh Dongol 
Definition 3 (Opacity). An execution G = (E, po,clo,rf,mo) is opaque iff: 


— dom(rfy) C Vis (VIS-RF) 
— rfr U mor U rby C po (INT) 
— (clo U rft U mor U (rby; [Vis])) is acyclic (EXT) 


where Vis = Succ U CPendRF with CPendRF  dom([CPend]; rf+). 


The existing definition of opacity [27] does not account for memory alloca- 
tion and assumes that all locations accessed (read/written) by a transaction 
are initialised with some value (typically 0). In our setting, we make no such 
assumption and extend the notion of opacity to dynamic opacity to account for 
memory allocation. More concretely, our goal is to ensure that accesses in visible 
transactions are valid, in that they are on locations that have been previously 
allocated in a visible transaction. We define an execution to be dynamically 
opaque (Def. 4) if its visible write accesses are valid, i.e. are mo-preceded by a 
visible allocation. 


Definition 4 (Dynamic opacity). An execution G is dynamically opaque iff 
it is opaque (Def. 3) and G.(W Vis) C rng([M N Vis]; G.mo). 


We next use the above definitions to define (dynamic durable) opacity over 
execution histories. In the context of persistent memory where executions may 
crash (e.g. due to a power failure) and resume thereafter upon recovery, a history 
is a sequence of events (Def. 1) partitioned into different eras separated by crash 
markers (recording a crash occurrence), provided that the threads in each era 
are distinct, i.e. thread identifiers from previous eras are not reused after a crash. 


Definition 5 (Histories). A history, H € HIST, is a pair (E,to), where 
comprises events and crash markers, Æ C EVENT U CRASH with CRASH 
{(n, 4) |neN}, and to is a total order on E, such that: 


l> By 


— (E,to;) is well-formed; and 
— events separated by crashes have distinct threads: 
([E]; to; [CRASH]; to; [E]) A to; = 0. 


A history (E',pto) is a prefix of history (E,to) iff E’ C E, pto = to|z and 
dom(to; [E’]) C F”. 


The client order induced by a history H = (E,to), denoted by clo(#), is the 
partial order on TXID defined by clo(H) = [S U A]; tor; [B]. We define history 
opacity as a prefiz-closed property (cf. [27|), designating a history H as opaque if 
every prefix (EZ, pto) of H induces an opaque execution. The notion of dynamic 
opacity over histories is defined analogously. 


Definition 6. A history H is opaque iff for each prefix H, = (E,pto) of H, 
there exist rf,mo such that (£,ptoj,clo(H,),rf,mo) is opaque (Def. 3). H is 
dynamically opaque iff for each prefix Hp=(E,pto) of H, there exist rf,mo such 
that (E, pto;, clo(H,), rf, mo) is dynamically opaque (Def. 4). 
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We define durable opacity over histories: a history H is durably opaque iff the 
history obtained from H by removing crash markers is opaque. We define dynamic, 
durable opacity analogously. 


Definition 7. A history (E,to) is durably opaque iff (E \ CRASH, tol g\ Cras) 
is opaque. A history (E,to) is dynamically and durably opaque iff the history 
(E \ CRASH, to| a\ Crasn) is dynamically opaque. 


Finally, we show that our definitions of history (durable) opacity are equivalent 
to the original definitions in the literature. (See [46] for the proof.) 


Theorem 1. History opacity as defined in Def. 6 is equivalent to the original 
notion of opacity [27]. History durable opacity as defined in Def. 7 is equivalent 
to the original notion of durable opacity [6]. 


5 Operationally Proving Dynamic Durable Opacity 


We develop an operational specification, DDTMS (§5.1), and prove it correct 
against DDOPACITY (§5.2). In particular, we show that every history (i.e. ob- 
servable trace) of DDTMS satisfies DDOPACITY. As DDTMS is a concurrent 
operational specification, it serves as basis for validating the correctness of TXP- 
MDK as well as our concurrent extensions PMDK-TML and PMDK-NOREc. 


5.1 DDTMS: The DTMS2 Automaton Extended with Allocation 


DDTMS is based on DTMS2, which is an operational specification that guar- 
antees durable opacity [6]. DT MS2 in turn is based on TMS2 automaton [20], 
which is known to satisfy opacity [33]. Furthermore, the DDTMS commit op- 
eration includes the simplification described by Armstrong et al [1], omitting 
a validity check when committing read-only transactions. In what follows we 
present DDTMS as a transition system. 


DDTMS state. Formally, the state of DDTMS is given by the variables in 
Fig. 7. DTMS2 keeps track of a sequence of memory stores, mems, one for each 
committed writing transaction since the last crash. This allows us to determine 
whether reads are consistent with previously committed write operations. Each 
committing transaction that contains at least one write adds a new memory 
version to the end of the memory sequence. As we shall see, mems tracks allocated 
locations since it maps every allocated location to a value different from L. 
Each transaction t is associated with several variables: pc,, beginIdx,, rdSet x, 
wrSet; and alSet;. The pc, denotes the program counter, ranging over a set of 
program counter values ensuring each transaction is well-formed and that each 
transactional operation takes effect between its invocation and response. The 
beginIdz, € N denotes the begin index, set to the index of the most recent memory 
version when the transaction begins. This is used to ensure the real-time ordering 
property between transactions. The rdSet; E€ Loc — VAL is the read set and 
wrSet, E€ Loc — VAL is the write set, recording the values read and written by 


168 Azalea Raad, Ori Lahav, John Wickerson, Piotr Balcer, and Brijesh Dongol 


A 


mems € MEM = SEQ (Loc > VAL_) Vat, = VALU {L}, where L ¢ VAL 
S € STATE ê TXIp > TSTATE 
s € TSTAaTE ÊN x (Loc — VAL) x (Loc — VAL) x P (Loc) 
storing the local begin index, read set, write set and allocation set 
PC € PCMap ê TXIp > PCVAL 
TxBegin, TxRead(l), TxWrite(l, v), 
Invs = oe (Lv) le Loc,ve var} 
TxBegin, TxRead(l, v), TxWrite(l, v), 
Se TxCommit, Abort 
PCVaL ê {init, ready, aborted, committed, fault, JI (i), A(TxCommit) | iE Invs} 
a € Action = {inv(i), res(r),e, 4 | i € INvs,r € Resps} 


> 


l> 


RESPS l e Loc,v € VAL 


Initially, PCo £ At.init So = At.(0, 0, 0, Ø) memso = [Az. L] 
Fig. 7: DDTMS state 


the transaction during its execution, respectively. We use S — T to denote a 
partial function from S to T. Finally, alSet, C Loc denotes the allocation set, 
containing the set of locations allocated by the transaction t. We use s.beginldx, 
s.rdSet, s.wrSet and s.alSet to refer to the begin index, read set, write set and 
allocation set of a state s, respectively. 


The read set is used to determine whether the values read by the transaction 
are consistent with its version of memory (using validldx). The write set, on 
the other hand, is required because writes are modelled using deferred update 
semantics: writes are recorded in the transaction’s write set and are not published 
to any shared state until the transaction commits. 


DDTMS Global Transitions. DDTMS is specified by the transition system 
shown in Fig. 8, where the DDTMS global transitions are given at the top and 
the per-transaction transitions are given at the bottom. The global transitions 
may either take a per-transaction step (rule (S)), match a transaction fault (rule 
(F)), crash (rule (X)), or behave chaotically due to a fault (rule (C)). 


Note that a crash transition models both a crash and a recovery. It sets 
the program counter of every live transaction to aborted, preventing them from 
performing any further actions after the crash. Since transaction identifiers are 
not reused, the program counters of completed transactions need not be modified. 
After restarting, it must not be possible for any new transaction to interact with 
stale memory states prior to the crash. Thus, we reset the memory sequence to 
be a singleton sequence containing the last memory state prior to the crash. 

Following the design of TXPMDK (and our concurrent extensions PMDK- 
TML and PMDK-NOREc) we do not check for reads and writes to unallocated 
memory within the library and instead delegate such checks to the client. An 
execution of TXPMDK (as well as PMDK-TML and PMDK-NOREc) that 
accesses unallocated memory is assumed to be faulty. In particular, a read or write 
of unallocated memory induces a fault (rule (F)). Once a fault is triggered, the 
program counter of each transaction is set to “fault” and recovery is impossible. 
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valididx(n,s, mems) = s.beginldx < n < |mems| A s.rdSet C mems(n) 
A s.alSet C {1 | mems(n)(/) = L} 


PC(t), S(t), mems “> pc, s, mems’ PC(t), S(t), mems ats fault, s, mems’ 
pcÆfault PC'’=At.fault 
(S) re (F) 


PC, S, mems —> 


PC, S, mems > 
PC’, S[t > s], mems’ 


PC[t +> pc], S[t + s], mems’ 
PC’=At. if PC(t) g {init, committed, fault} 


then aborted else PC(t) (x) PC = i fault s 


PC, S, mems PC, S, mems 


PC, S, mems ~+ PC’, S, (last(mems)) 


(DB) 
(IB) pc=A(TxBegin) (IOP) 
pc = init s'=s[beginldx ++ |mems| —1] pc = ready a€ InvOps 
inv(TxBegin) res(TxBegin) inv(a) 
pc, s, mems —————> pc, s, mems —————> pc, s, mems ———> 
A(TxBegin),s’, mems ready, s, mems A(a),s, mems 
(DR-E) 
pc= A(TxRead(I)) (FR) 
1¢s.alSet Udom(s.wrSet) pc= A(TxRead(I)) (RA) 
valididx(n, s, mems) lgs.alSet U dom(s.wrSet) init, ready, 
mems(n) (l) = v vL validldx(n, s, mems) pc ¢ { committed, 
rs = s.rdSet $ {Irv} mems(n)(l)=L aborted, fault 
res(TxRead(l,v)) fault res (Abort) 
pc, s, mems —— > pc, s, mems ——> pc, s, mems ————> 
ready, s[rdSet +> rs], mems fault, s, mems aborted, s, mems 
(DR-I) (DR-A) (DW) 
pc = A(TxRead(l)) pc = A(TxRead(l)) pc = A(TxWrite(l, v)) 
lE s.alSet V last(mems) (l) # L 


l g dom(s.wrSet) 


l E s.alSet ws = s.wrSet © {lH v} 


le dom(s.wrSet) 
s.wrSet(l) = v 


res(TxRead(l,v)) res(TxRead(l,0)) res(TxWrite(l,v)) 
— A poe, s, mems —————> pc, s, mems ———— = 


pc, s, mems 
ready, s, mems ready, s, mems ready, s[wrSet +> ws], mems 
(FW) (DA) (DC-RO) 
pce = A(TxWrite(I, v)) pe = A(TxAlloc) pc = A(TxCommit) 
l £ s.alSet l¢s.alSet s.alSet = Ø 
last(mems) (1) = L as = s.alSet W {1} dom(s.wrSet) = 0 
pc, S, mems = pc, s, mems Sh pc, s, mems > 
II(TxCommit), s, mems 


fault, s, mems ready, s[alSet ++ as], mems 


(DC-W) 
pc = A(TxCommit) 
E TN validldx(last(mems), s, mems) 
iste mems’=mems-++ ((last(mems) @ {/++ 0 | 1€s.alSet}) @s.wrSet) 


res(TxCommit) z 
— > . 
pc, s, mems = [7 (TxCommit),s, mems’ 


pc, s, mems 
committed, s, mems 


Fig. 8: The DDTMS global transitions (above) with its per-transaction transitions 


(below), where 
InvOps £ {TxWrite(l, v), TxRead(l) | 1 € LOC, v € VAL} U {TxAlloc, TxCommit } 
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From a faulty state the system behaves chaotically, i.e. it is possible to generate 
any history using rule (C). 


DDTMS Per-Transaction Transitions. The system contains externally 
visible transitions for invoking an operation (rules IB and IOP), which set the 
program counters to A(a), where a is the operation being performed. This allows 
the histories of the system to contain operation invocations without corresponding 
matching responses. 


For the begin, allocation, read and write operations, an invocation can be 
followed by a single transition (rules DB, DA, DR-E, DR-I, DR-E and DW) 
that performs the operation combined with the corresponding response. Following 
an invocation, the commit operation is split into internal do actions ((DC-RO) 
and (DC-W)) and an external response (rule RC). Finally, after a read/write 
invocation, the system may perform a fault transition for a read (rule FR) 
or a write (rule FW). The main change from DTMS2 is the inclusion of an 
allocation procedure. The design of DDTMS allows the executing transaction, 
t, to tentatively allocate a location l within its transaction-local allocation set, 
alSet,. This allocation in DDTMS is optimistic — correctness of the allocation is 
only checked when ¢ performs a read or commits. 


Successful (non-faulty) read and write operations take allocations into account 
as follows. (1) A read operation of transaction t reads from a prior write (rule 
(DR-I)) or allocation (rule (DR-A)) performed by t itself. In this case, the 
operation may only proceed if the location / is either in the allocation or write 
set of t. The effect of the operation is to return the value of l in the write set 
(if it exists) or 0 if it only exists in the allocation set. (2) A read operation of 
transaction t reads from a write or allocation performed by another transaction 
(rule (DR-E)). Note that as with DTMS2 and TMS2, in DDTMS a read-only 
transaction may serialise with any memory index n after beginIdr,. Moreover, 
within valididx, in addition to ensuring that t’s read set is consistent with the 
memory index n (second conjunct), we must also ensure that t’s allocation set is 
consistent with memory index n (third conjunct) by ensuring that none of the 
locations in the allocation set have been allocated at memory index n. (3) A 
write of transaction t successfuly performs its operation (rule (DW)), which can 
only happen if the location | being written has been allocated, either by t itself 
(first disjunct), or by a prior transaction (second disjunct). A writing transaction 
must serialise after the last memory index in mems, thus the second disjunct 
checks allocation against the last memory index. 


A successful (non-faulty) transaction is split into two cases: (1) t is a read-only 
transaction (rule (DC-RO)), where both alSet, and wrSet, are empty for t. In 
this case, the transaction simply commits. (2) ¢ has performed an allocation or 
a write (rule (DC-W)). Here, we check that t is valid with respect to the last 
memory in mems using validldx. The commit introduces a new memory into the 
memory sequence mems. The update also ensures that all pending allocations in 
alSet, take effect before applying the writes from t’s write set. 
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5.2 Soundness of DDTMS 


We state our main theorem relating DDTMS to DDOPACITY. As the models are 
inherently different, we need several definitions to transform DDTMS histories to 
those compatible with DDOPACITY. 

An execution of a labelled transition system (LTS) is an alternating se- 
quence of states and actions, i.e. a sequence of the form sọ a1 52 G2... Sn—1 ün Sn 
such that for each 0 < i < n, 5;_1 2i, si and so is an initial state of the 
LTS. Suppose ø is an execution of DDTMS. We let AH, = ajag...an be 
the action history corresponding to o, and EH, be the external history of o, 
which is AH, restricted to non-e actions. Let FF, be the longest fault-free 
prefix of EH,. We generate the history (in the sense of Def. 5) correspond- 
ing to FF, as follows. First, we construct the labelled history, LH, of o from 
FF, by removing all invocation actions (leaving only responses and crashes). 
Then, we replace each response a; = a; by the event (i,t, t,L(a@)), where 
L(res(TxBegin)) = B, L(res(TxAlloc(l))) = (M,l,0), L(res(TxRead(I,v))) = 
(R,l,v), L(res(TxWrite(l,v))) = (W,l, v), L(res(Abort)) = A, L(inv(TxCommit)) = 
C, and L(res(TxCommit)) = S. Similarly, we replace each crash action a; = 4 by 
the pair (i, 4). Note that in this construction, for simplicity, we conflate threads 
and transactions, but this restriction is straightforward to generalise. Finally, let 
the ordered history of o, denoted OH, be the total order corresponding to LH. 


Theorem 2. For any execution o of DDTMS, the ordered history OH, satisfies 
DDOPACITY. 


The definitions of (dynamic) durable opacity can lifted to the level of systems 
in the standard manner, providing a notion of correctness for implementations [28]. 


6 Modelling and Validating Correctness in FDR4 


FDR4 [26] is a model checker for CSP [29] that has recently been used to verify 
linearisability [38], as well as opacity and durable opacity [23]. We similarly provide 
an FDR4 development, which allows proofs of refinement to be automatically 
checked up to certain bounds. This is in contrast to manual methods of proving 
correctness of concurrent objects [21,19], which require a significant amount of 
manual human input (though such manual proofs are unbounded). 

An overview of our FDR4 development [47] is given in Fig. 9. We derive two 
specifications from DDTMS. The first is an FDR4 model of DDTMS itself, based 
on prior work [38,23], but contains the extensions described in §5.1. The second 
is DDTMS-Seq, which restricts DDTMS to a sequential crash-free specification. 
We use DDTMS-Seq to obtain (lower-bound) liveness-like guarantees, which 
strengthens traditional deadlock or divergence proofs of refinement. These lower- 
bound checks ensure our models contain at least the traces of DDTMS-Seq. 

Fig. 10 summarises our experiments on the upper bound checks, where the 
times shown combine the compilation and model exploration times. Each row 
represents an experiment that bounds the number of transactions (#txns), 
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DDTMS-Seq 
4 i $ TX- PMDK- PMDK- 
(sequential lower bound) Memory #txns #locs #val #buff PMDK TML NOREC 
refines 
7 2 PSC 2 2 2 2 5.83s 5.90s 6.74s 
Implementations Meme PSC 2 3 2 2 201.038 213.978 271.35s 
modéle PSC 2 2 3 2 21.658 23.478 27.40s 
PMDK ma PSC 2 2 2 3 5.83s 5.78s 6.60s 
PMDK-TML PSC PTSOgyn 2 1 2 2 0.61s 3.96s 1.57s 
PTSOsyn 2 2 2 2 6.67s 6.71s 7.738 
(PMDK-NORzc) (PTSO.y:) PTSOsyn 2 3 2 2 267.1s 268.91s 319.18s 
(Cee) PTSOsyn 2 2 3 2 24.10s 25.538 29.24s 
refines PTSOsyn 2 2 2 3 14.37s 14.19s 15.41s 
ppTMS : 
(concurrent upper bound) Fig. 10: Summary of upper bounds checks (to- 


tal time in seconds: compilation + model explo- 
Fig.9: Overview of FDR4 ration). The time out (TO) is set to 1000 seconds 
checks of compilation time. 


locations (#locs), values (#val) and the size of the persistency and store buffers 
(#buff). The times reported are for an Apple M1 device with 16GB of memory. 
The first row depicts a set of experiments where the implementations execute 
directly on NVM, without any buffers. As we discuss below, these tests are 
sufficient for checking lower bounds. The baseline for our checks sets the value 
of each parameter to two, and Fig. 10 allows us to see the cost of increasing 
each parameter. Note that all models time out when increasing the number 
of transactions to three, thus these times are not shown. Also note that for 
TXPMDK (which is single-threaded), the checks for PSC also cover PTSOsyn, 
since PTSOsyn is equivalent to PSC in the absence of races [31]. Nevertheless, 
it is interesting to run the single-threaded experiments on the PTSOsyn model 
to understand the impact of the memory model on the checks. 

In our experiments we use FDR4’s built-in partial order reduction features 
to make the upper bound checks feasible. This has a huge impact on the model 
checking speed; for instance, the check for PMDK-TML with two transactions, 
two locations, two values and buffer size of two reduces from over 6000 seconds 
(1 hour and 40 minutes) to under 7 seconds, which is almost a 1000-fold im- 
provement! This speed-up makes it feasible to use FDR4 for rapid prototyping 
when developing programs that use TXPMDK, even for the relatively complex 
PTSOgyn memory model. 


7 Related Work 


Crash Consistency. Several authors have defined notions of atomicity for 
concurrent objects that take persistency into account (see [4] for a survey.) None 
of these conditions are suitable as they define consistency for concurrent operations 
(of concurrent data structures) as opposed to transactional memory. 
Approaches and semantics to crash-consistent transactions stretch back to the 
mid 1970s, which considered the problem in the database setting [24,34]. Since 
then, a myriad of definitions have been developed for particular applications 
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(e.g. distributed systems, file systems, etc.). For plain reads and writes, one of 
the first studies of persistency models focussed on NVM is by Pelley et al. [42]. 
Since then, several semantic models for real hardware (Intel and ARM) have 
been developed [50,49,31,12,48]. For transactional memory, there are only a few 
notions that combine a notion of crash consistency with ACID guarantees as 
required for concurrent durable transactions. Raad et al. [50] define a persistent 
serializability under relaxed memory, which does not handle aborted transactions. 
As we have already discussed, Bila et al. [6] define durable opacity, but this is 
defined in terms of (totally ordered) histories as opposed to partially ordered 
graphs. Neither persistent serialisability nor durable opacity handle allocation. 


Validating the TXPMDK Implementation. Even without a clear consistency 
condition, a range of papers have explored correctness of the C/C++ implemen- 
tation. Bozdogan et al. [8] built a sanitiser for persistent memory and used it 
to uncover memory-safety violations in TXPMDK. Fu et al. [25] have built a 
tool for testing persistent key-value stores and uncovered consistency bugs in the 
PMDK libraries. Liu et al. [36] have built a tool for detecting cross-failure races 
in persistent programs, and uncovered a bug in PMDK’s libpmemobj library 
(see ‘Bug 4’ in their paper). They are at a different level of abstraction than ours 
since they focus on the code itself and do not provide any description of the 
design principles behind PMDK. 


Raad et al. [45] and Bila et al. [7] have developed logics for reasoning about 
programs over the Px86-TSO model (which we recall is equivalent to PTSOsyn). 
However, these logics have thus far only been applied to small examples. Ex- 
tending these logics to cover a proof by simulation and a full (manual) proof of 
correctness of PMDK, PMDK-TML and PMDK-NOREc would be a significant 


undertaking, but an interesting avenue for future work. 


Transactional Memory (TM). Several works have studied the semantics 
of TM [15,22,44,43]. However, our works differ from those in that they do not 
account for persistency guarantees and crash consistency. However, while earlier 
works [44,43] merely propose a model for weak isolation (i.e. mixing transactional 
and non-transactional accesses), [15,22] formalise the weak isolation in various 
hardware and software TM platforms, albeit without validating their semantics. 


Several approaches to crash consistency have recently been proposed. For 
a survey and comparison of techniques (in addition to transactions) see [3]. 
OneFile [52], Romulus [16], and Trinity and Quadra [51] together describe a set 
of algorithms that aim to improve the efficiency of TXxPMDK by reducing the 
number of fence instructions. Liu et al. [35] present DudeTM, a persistent TM 
design that uses a shadow copy of NVM in DRAM, which is is shared amongst 
all transactions. Their approach comprises three key steps: Zardoshti et al. [56] 
present an alternative technique for making STMs persistent by instrumenting 
STM code with additional logging and flush instructions. However, none of these 
works have defined any formal correctness guarantees, and hence do not offer any 
proofs of correctness either. In particular, the role of allocation and its interaction 
with reads and writes is generally unclear. 
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As well as defining durable opacity, Bila et al. [6] develop a persistent version 
of the TML STM [17] by introducing explicit undo logging and flush instructions. 
They then prove this to be durably opaque via the DTMS2 specification. More 
recently, Bila et al. [5] have developed a technique for transforming both an 
STM and its corresponding opacity proof by delegating reads/writes to memory 
locations controlled by the TM to an abstract library that is later refined to use 
volatile and non-volatile memory. Neither of these works use TXPMDK, and are 
over a sequentially consistent memory model. 


8 Conclusions and Future Work 


Our main contribution is validating the correctness for TXPMDK via the develop- 
ment of declarative (DDOPACITY) and operational (DDTMS) consistency criteria. 
We provide an abstraction of TXPMDK and show that it satisfies DDTMS and 
hence DDOPACITY by extension. Additionally, we develop PMDK-TML and 
PMDK-NOREC as two concurrent extensions of TXPMDK that are based on 
existing STM designs, and show that these also satisfy DDTMS (and hence 
DDOPACITY). All of our models are validated under the PSC and PTSOsyn 
memory models using FDR4. 

As with most accepted existing transactional models (be it with or without 
persistency), we assume strong isolation, where each non-transactional access 
behaves like a singleton transaction (a transaction with a single access). That 
is, even ignoring persistency, there are no accepted definitions or models for 
mixing non-transactional and transactional accesses, and all existing transactional 
models (including opacity and serialisability) assume strong isolation. Indeed, 
PMDK transactions are specifically designed to be used in a purely transactional 
setting and are not meant to be used in combination with non-transactional 
accesses; i.e. they would have undefined semantics otherwise. Consequently, as 
we do not consider mixing transactional code with non-transactional code, RMW 
(read-modify-write) instructions are irrelevant in our setting. Specifically, as 
non-transactional access are treated as singleton transactions, RMW instructions 
are not needed or relevant since they behave as transactions and their atomicity 
would be guaranteed by the transactional semantics. 

One threat to validity of our work is that the model checking results are on 
a small number of transactions, locations, values, and buffer sizes (see Fig. 10). 
However, we have found that these sizes have been adequate for validating all 
of our examples, i.e., when errors are deliberately introduced, FDR validation 
fails and counter-examples are automatically generated. Currently, we do not 
know whether there is a small model theorem for durable opacity in general. 
This is a separate line of work and a general question that we believe is out 
of the scope of this paper. Specifically, our focus here is on making PMDK 
transactions concurrent, providing a clear specification for PMDK (and its 
concurrent variations) with dynamic allocation, and validating correctness of the 
results under a realistic memory model. 
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Abstract. This report extends §6 of the main paper by providing further 
details of the mechanisation effort. 


1 Modelling and Validating Correctness in FDR4 


FDR4 [4] is a model checker for CSP [5] that has recently been used to verify 
linearisability [7], as well as opacity and durable opacity [3]. We similarly provide 
an FDR4 development, which allows proofs of refinement to be automatically 
checked up to certain bounds. This is in contrast to manual methods of proving 
correctness of concurrent objects [2,1], which require a significant amount of 
manual human input (though such manual proofs are unbounded). FDR4 uses 
a variety of underlying model checking paradigms and partial-order reduction 
techniques [4], depending on the structure of the files to be verified. FDR4 builds 
on FDR3, but the exact implementation details of FDR4 are not publicly available 
since it is a commercial product (available for free academic use). 
The CSP files corresponding to this report may be downloaded from [8]. 


1.1 Modelling Details 


One of the most challenging aspects of the FDR4 development is the modelling 
work itself. Our algorithms execute over a shared memory, but the CSP formalism 
is based on communicating processes with no notion of shared states. Thus, for each 
location we must explicitly define handler processes that communicate through 
channels to update and return the values of components (e.g. the addresses, 
read/write sets) of each model. Moreover, the implementations (TXPMDk, 
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PMDK-NOREc and PMDK-TML), the specification (DDTMS) and underlying 
memory models (PSC and PTSOsgyn) we consider are non-trivial, significantly 
increasing the challenge of the modelling effort. Although constructing the models 
is challenging, once the models have been developed, they can be combined 
in a modular fashion. We have taken advantage of this feature to combine 
our implementations with different memory models during development. The 
combination of PMDK-TML and TML/NOree also takes advantage of this 
modularity. 

This modularity also means that our models are reusable. One could use our 
models to check other developments, e.g. those that use TXPMDK to implement 
other failure-atomic data structures, or verify redesigns of TXPMDK over different 
memory models. Specifically, we use a top-level CSP process (which may comprise 
an interleaved composition of processes for each transaction) to model the most 
general client. Each transaction process begins a transaction, and then calls an 
unbounded number of reads, writes and allocations at non-deterministically chosen 
locations and with non-deterministically chosen values. An in-flight transaction 
process may also non-deterministically choose to terminate by calling commit 
instead of calling a read, write or allocation. Each operation call produces 
an externally visible invocation event, and when the operation terminates, an 
externally visible response is generated. Some operations may respond with an 
abort, in which case the transaction process itself terminates. 

Additionally, there is an externally visible crash event that synchronises 
with all processes. At the level of the abstraction (i.e. DDTMS), this simply 
terminates all in-flight transactions, and resets the memory sequence (as detailed 
by the rule (X)). At the level of the implementation, all in-flight transactions are 
terminated and additionally, the store and persistency buffers are cleared. This 
means that when execution resumes, the value of each location is taken from 
NVM. Immediately after a crash (and before any other processes are started), the 
recovery process corresponding to the algorithm is executed. Note that transaction 
identifiers are never reused. 

We eschew further details of our FDR4 models since they are provided as 
supplementary material [8] and also refer the interested reader to other prior 
works [7,3]. 


1.2 Overview of Development 


An overview of our FDR4 development is given in Fig. 1. We derive two specifi- 
cations from DDTMS. The first is an FDR4 model of DDTMS itself, based on 
prior work [7,3], but contains the extensions required for DDTMS. The second is 
DDTMS-SEQ, which restricts DDTMS to a sequential crash-free specification. We 
use DDTMS-SEQ to obtain (lower-bound) liveness-like guarantees, which strength- 
ens traditional deadlock or divergence proofs of refinement. These lower-bound 
checks ensure our models contain at least the traces of DDTMS-SEq. 
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Memory |#txns}#locstévalldtbuff]| rxPMDK PMDRSEMDE- 
DDTMS-Seq j TML NOREC 
(sequential lower bound) PSC 2/2 )/2]2 5.838 5.90s) 6.74 
refines PSC 2 3/2] 2 201.03s| 213.97s| 271.35s 
| implementations Memory PSC 2/2]3] 2 21.658) 23.475} 27.408 
PMDK models PSC 2 2 2 3 5.83s| 5.788 6.60s 
uses PTSOsyn|_ 2 1|]2)] 2 0.61s 3.96s 1.57s 
PMDK-TML PSC PTSOsyn} 2 | 2 | 2] 2 6.67s. 6.71 7.73s 
PTSOsyn 2 3/2] 2 267.1s| 268.91s| 319.18s| 
(PMDK-NORec) PTSO.,. 2}2)|3] 2 24.105) 25.53s| 29.24s| 
refines PTSOsyn 2 2 | 2-3 14.37s| 14.19s| 15.41s 

DDTMS 
(concurrent upper bound) Fig. 2: Summary of upper bounds checks (to- 


tal time in seconds: compilation + model explo- 
Fig. 1: Overview of FDR4 ration). The time out (TO) is set to 1000 seconds 
checks of compilation time. 


CSP files. Our development comprises the following files. 


File Description 

Types .csp Contains the basic types and parameters. Use this file to increase / 
decrease the number of transactions, memory locations, values, etc. 
Defaults to 2 transactions, 2 locations and two values. 

MemoryP.csp Handler for memory, as well as the redo and undo logs. Operations 
query handlers to read/update the shared memory, flush to persistent 
memory and recover. This file is used to switch between memory 
models (NVM (which contains no crashes), PSC and PTSOgyn) - see 
the bottom of the file. 

LocHandler.csp|Handler for local memory (i.e., the loc variable used by the imple- 
mentations in Figs. 5 and 6. 

ddTMS.csp Model of the DDTMS automata from the main paper (Fig. 8). 

PMDK. csp Model of PMDK from Fig. 4 of the main paper. 

PMDK-TML.csp |Model of PMDK-TML from Fig. 5 of the main paper. 
PMDK-NOrec.csp|Model of PMDK-NOREc from Fig. 6 of the main paper. 
Refinement .csp|File containing all checks to be performed. 


Description of Tests. The file Refinement .csp comprises six tests as detailed 
in Figs. 9 and 10 of the paper. There are three upper-bound checks, which show 
that PMDK, PMDK-TML and PMDK-NOREC are refinements of DpDTMS, 
validating soundness: 


— FinalTMS [T= PMDK, checking that PMDK refines DDTMS. 
— FinalTMS [T= FinalTML, checking that PMDK-TML refines DDTMS. 
— FinalTMS [T= FinalNOrec, checking that PMDK-NOREc refines DDTMS. 


Each of these tests can be run against the memory models: NVM (which contains 
no crashes), PSC and PTSOsyn by commenting/uncommenting the relevant 
lines at the end of the file MemoryP.csp. 

Additionally, there are three lower-bound checks, which show DDTMS-SEQ 
are refinements of PMDK, PMDK-TML and PMDK-NOREc. 
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— PMDK [T= SeqFinalTMS 
— FinalTML [T= SeqFinalTMS 
— FinalNOrec [T= FinalNOrec 


Each of these tests can be run against the memory models: NVM and PSC as 
defined in the file MemoryP.csp. Note that the test against PTSOsyn times out. 
However, the tests above are sufficient since PTSOsyn reduces to PSC in the 
absence of data races (e.g., sequential executions). 

Each check in FDR4 is split into two phases: (1) a compilation phase that 
builds the models; and (2) a model exploration phase. The characteristics of the 
upper and lower bounds checks are distinct. When naively checking the upper 
bound, compilation is almost instantaneous but model exploration times can be 
significant; these times are swapped for the lower bounds checks. 

In general, lower-bounds take much longer to verify than the upper-bounds 
since FDR4 is optimised to verify abstract (low-detail) specifications are refined 
by concrete (high-detail) implementations. The lower bounds checks use the more 
complex models as the specification, leading to the creation of very large space- 
inefficient models, putting pressure on the available system memory. However, the 
lower-bound checks for PSC and PTSOsyn are superceded by the corresponding 
checks over NVM, since the memory models PSC and PTSOsyn are both 
supersets of NVM. That is, any trace over NVM must also be a trace PSC 
and PTSOsyn. For two transactions, two locations and two values, the checks 
for PMDK, PMDK-TML and PMDK-NOREc take 12.16, 17.36, and 56.02 
seconds, respectively. 


1.3 Summary of Results 


Fig. 2 summarises our experiments on the upper bound checks, where the times 
shown combine the compilation and model exploration times. Each row represents 
an experiment that bounds the number of transactions (#txns), locations (#locs), 
values (#val) and the size of the persistency and store buffers (#buff). The times 
reported are for an Apple M1 device with 16GB of memory. The first row depicts a 
set of experiments where the implementations execute directly on NVM, without 
any buffers. As we discuss below, these tests are sufficient for checking lower 
bounds. The baseline for our checks sets the value of each parameter to two, 
and Fig. 2 allows us to see the cost of increasing each parameter. Note that all 
models time out when increasing the number of transactions to three, thus these 
times are not shown. Also note that for TXxPMDK (which is single-threaded), 
the checks for PSC also cover PTSOsyn, since PTSOsyn is equivalent to PSC in 
the absence of races [6]. Nevertheless, it is interesting to run the single-threaded 
experiments on the PTSOsyn model to understand the impact of the memory 
model on the checks. 

In our experiments we use FDR4’s built-in partial order reduction features 
to make the upper bound checks feasible. This has a huge impact on the model 
checking speed; for instance, the check for PMDK-TML with two transactions, 
two locations, two values and buffer size of two reduces from over 6000 seconds 
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(1 hour and 40 minutes) to under 7 seconds, which is almost a 1000-fold im- 
provement! This speed-up makes it feasible to use FDR4 for rapid prototyping 
when developing programs that use TXPMDK, even for the relatively complex 
PTSOgsyn memory model. 
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Abstract. We present a general framework for specifying and verifying 
persistent libraries, that is, libraries of data structures that provide some 
persistency guarantees upon a failure of the machine they are execut- 
ing on. Our framework enables modular reasoning about the correctness 
of individual libraries (horizontal and vertical compositionality) and is 
general enough to encompass all existing persistent library specifications 
ranging from hardware architectural specifications to correctness con- 
ditions such as durable linearizability. As case studies, we specify the 
FliT and Mirror libraries, verify their implementations over Px86, and 
use them to build higher-level durably linearizable libraries, all within 
our framework. We also specify and verify a persistent transaction library 
that highlights some of the technical challenges which are specific to per- 
sistent memory compared to weak memory and how they are handled by 
our framework. 


1 Introduction 


Persistent memory (PM), also known as non-volatile memory (NVM), is a new 
kind of memory, which can be used to extend the capacity of regular RAM, 
with the added benefit that its contents are preserved after a crash (e.g. a power 
failure). Employing PM can boost the performance of any program with access 
to data that needs to survive power failures, be it a complex database or a plain 
text editor. 

Nevertheless, doing so is far from trivial. Data stored in PM is mediated 
through the processors’ caching hierarchy, which generally does not propagate 
all memory accesses to the PM in the order issued by the processor, but rather 
performs these accesses on the cache and only propagates them to the memory 
asynchronously when necessary (i.e. upon a cache miss or when the cache has 
reached its capacity limit). Caches, moreover, do not preserve their contents upon 
a power failure, which results in rather complex persistency models describing 
when and how stores issued by a program are guaranteed to survive a power 
failure. To ensure correctness of their implementations, programmers have to 
use low-level primitives, such as flushes of individual cache lines, fences that 
enforce ordering of instructions, and non-temporal stores that bypass the cache 
hierarchy. 

These primitives are often used to implement higher-level abstractions, pack- 
aged into persistent libraries, i.e. collections of data structures that must guar- 
antee to preserve their contents after a power failure. Persistent libraries can be 
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thought of as the analogue of concurrent libraries for persistency. And just as 
concurrent libraries require a specification, so do persistent libraries. 


The question naturally arises: what is the right specification for persistent 
libraries? Prior work has suggested a number of candidate definitions, such as 
durable linearizability, buffered durable linearizability [17], and strict lineariz- 
ability [1], which are all extensions of the well-known correctness condition for 
concurrent data structures (i.e. linearizability [15]). In general, these definitions 
stipulate the existence of a total order among all executed library operations, 
a contiguous prefix of which is persisted upon a crash: the various definitions 
differ in exactly what this prefix should be, e.g. whether it is further constrained 
to include all fully executed operations. 


Even though these specifications have a nice compositionality property, we 
argue that none of them are the right specification pattern for every persistent 
concurrent library. While for high-level persistent data structures, such as stacks 
and queues, a strong specification such as durable or strict linearizability would 
be most appropriate, this is certainly not the case for a collection of low-level 
primitives. Take, for instance, a library whose interface simply exposes the ex- 
act primitives of the underlying platform: memory accesses, fences and flushes. 
Their semantics, recently formalized in [30,19,28] in the case of the Intel-x86 
architecture and in [31,5] in the case of the ARMv8 architecture, quite clearly 
do not fit into the framework of the durable linearizability definitions. More 
generally, there are useful concurrent libraries (especially in the context of weak 
memory consistency) that are not linearizable [26]; it is, therefore, conceivable 
that making those libraries persistent will require weak specifications. 

Another key problem with attempting to specify persistent libraries modu- 
larly is that they often break the usual abstraction boundaries. Indeed, some 
models such as epoch persistency [6,24] provide a global persistency barrier that 
affects all memory locations, and therefore all libraries using them. Such global 
operations also occur at higher abstraction layers: persistent transactional li- 
braries often require memory locations to be registered with the library in order 
for them to be used inside transactions. As such, to ensure compatibility with 
such transactional libraries, implementers of other libraries must register all lo- 
cations they use and ensure that any component libraries they use do the same. 


In this paper, we introduce a general declarative framework that addresses 
both of these challenges. Our framework provides a very flexible way of specifying 
persistent libraries, allowing each library to have a very different specification— 
be it durable linearizability or a more complex specification in the style of the 
hardware architecture persistency models. Further, to handle libraries that have 
a global effect (such as persistent barriers above) or, more generally, that make 
some assumptions about the internals of all other libraries, we introduce a tag 
system, allowing us to describe these assumptions modularly. 

Our framework supports both horizontal and vertical compositionality. That 
is, we can verify an execution containing multiple libraries by verifying each 
library separately (horizontal compositionality). Moreover, we can completely 
verify the implementation of a library over a set of other libraries using the 
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specifications of its constituent libraries without referring to their implemen- 
tations (vertical compositionality). To achieve the latter, we define a semantic 
notion of substitution in terms of execution graphs, which replaces each library 
node by a suitably constrained set of nodes (its implementation). 

For simplicity, in §2, we develop a first version of our framework over the 
classical notion of an execution history [15], which we extend with a notion 
of crashes. This basic version of our framework includes full support for weak 
persistency models but assumes an interleaving semantics of concurrency; i.e. 
sequential consistency (SC) [23]. 

Subsequently, in §3 we generalise and extend our framework to handle weak 
consistency models such as x86-TSO [32] and RC11 [22], thereby allowing us 
to represent hardware persistency models such as Px86 [30] and PARMv8 [31], 
in our framework. To do so, we rebase our formal development over execution 
graphs using Yacovet [26] as a means of specifying the consistency properties of 
concurrent libraries. 

We illustrate the utility of our framework by encoding in it a number of exist- 
ing persistency models, ranging from actual hardware models such as Px86 [30], 
to general-purpose correctness conditions such as durable linearizability [17]. We 
further consider two case studies, chosen to demonstrate the expressiveness of 
our framework beyond the kind of case studies that have been worked out in the 
consistency setting. 

First, in §4 we use our framework to develop the first formal specifications 
of the FliT [35] and Mirror [10] libraries and establish the correctness of not only 
their implementations against their respective specifications, but also their asso- 
ciated constructions for turning a linearizable library into a durably linearizable 
one. This generic theorem is new compared to the case studies in [26], and lever- 
ages our ‘semantic’ approach in §3. Moreover, our proofs of these constructions 
are the first to establish this result in a weak consistency setting. 

Second, in §5 we specify and prove an implementation of a persistent trans- 
actional library Ltrans, which provides a high-level construction to persist a set of 
writes atomically. The Ltrans library illustrates the need for a ‘well-formedness’ 
specification (in addition to its consistency and persistency specifications) that 
requires clients of the Lian; library to ensure e.g. that Lian; writes appear only 
inside transactions. Moreover, it demonstrates the use of our tagging system to 
enable other libraries to interoperate with it. 


Contributions and Outline. The remainder of this article is organised as 


follows. 


§2 We present our general framework for specifying and verifying persistent 
libraries in the strong sequential consistency setting. 

§3 We further generalise our framework to account for weaker consistency mod- 
els. 

§4 We use our framework to develop the first formal specifications of the FliT 
and Mirror libraries, verify their implementations against their specifications 
and prove their general construction theorems for turning linearizable li- 
braries to durably linearizable ones. 
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85 We specify a persistent transactional library Ltrans, develop an implemen- 
tation of Ltrans (over the Intel-x86 architecture) and verify it against its 
specification. We then consider two case studies of vertical and horizontal 
composition in our framework using Ltrans- 


We conclude and discuss related and future work in §6. The full proofs of all 
theorems stated in the paper are given in the technical appendix. 


2 A General Framework for Persistency 


We present our framework for specifying and verifying persistent libraries, which 
are collections of methods that operate on durable data structures. Following 
Herlihy et al. [15], we will represent program histories over a collection of libraries 
Aas A-histories, i.e. as sequences of calls to the methods of A, which we will then 
gradually enhance to model persistency semantics. Throughout this section, we 
assume an underlying sequential consistency semantics; in §3 we will generalize 
our framework to account for weaker consistency models. 

In the following, we assume the following infinite domains: Meth of method 
names, Loc of memory locations, Tid of thread identifiers, and Val > LocUTid 
of values. We let m range over method names, x over memory locations, t over 
thread identifiers, and v over values. An optional value v} € Val, is either a 
value v € Val or L ¢ Val. 


2.1 Library Interfaces 


A library interface declares a set of method invocations of the form m(v). Some 
methods are are designated as constructors; a constructor returns a location 
pointing to the new library instance (object), which is passed as an argument to 
other library methods. An interface additionally contains a function, loc, which 
extracts these locations from the arguments and return values of its method 
calls. 


Definition 1. A library interface L is a tuple (M,M¢,loc), where the set of 
method invocations M is a subset of P (Meth x Val"), M. C M is the set of 
constructors, and loc: M x Val, —> P (Loc) is the location function. 


Example 1 (Queue library interface). The queue library interface, L@ueue, has 
three methods: a constructor QueueNew(), which returns a new empty queue; 
QueueEnq(z, v) which adds value v to the end of queue z; and QueueDeq(x) 
which removes the head entry in queue x. We define loc(QueueNew(),2) = 
loc(QueueEnq(z, -), -) = loc(QueueDeq(x), -) = {a}. 


A collection A is a set of library interfaces with disjoint method names. When 
A consists of a single library interface L, we often write L instead of {L}. 
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2.2 Histories 


Given a collection A, an event e € Events(A) of A is either a method invocation 
m(v) with m(v) € Uze4 L.M and t € Tid or method response (return) event 
ret(v)s. 

A A-history is a sequence of events of A whose projection to each thread is 
an alternating sequence of invocation and return events which starts with an 
invocation. 


Definition 2 (Sequential event sequences). A sequence of events e1... en 
is sequential if all its odd-numbered events e1,e3,... are invocation events and 
all its even-numbered events €2,e4,... are return events. 


Definition 3 (Histories). A A-history is a finite sequence of events H € 
Events(A)*, such that for every thread t, the sub-sequence H|t] comprising only 
of t events is sequential. The set Hist(A) denotes the set of all A-histories. 


When clear from the context, we refer to occurrences of events in a history by 
their corresponding events. For example, if H = e1 ... en andi < j, we say that e; 
precedes ej and that e; succeeds e;. Given an invocation m(v); in H, its matching 
return (when it exists) is the first event of the form ret(v); that succeeds it (they 
share the same thread). A call is a pair m(v),:u_ of an invocation and either its 
matching return vı € Val (complete call) or vi = L (incomplete call). 

A library (specification) comprises an interface and a set of consistent histo- 
ries. The latter captures the allowed behaviors of the library, which is a guarantee 
made by the library implementation. 


Definition 4. A library specification (or simply a library) L is a tuple (L, So), 
where L is a library interface, and Se C Hist(L) denotes its set of consistent 
histories. 


2.3 Linearizability 


Linearizability [15] is a standard way of specifying concurrent libraries that have 
a sequential specification S, denoting a set of finite sequences of complete calls. 
Given a sequential specification S, a concurrent library L is linearizable under $ 
if each consistent history of L can be linearized into a sequential one in S, while 
respecting the happens before order, which captures causality between calls. It 
is sufficient to consider consistent executions because inconsistent executions 
are, by definition, guaranteed by the library to never happen. Happens-before is 
defined as follows. 


Definition 5 (Happens-Before). A method call Cı happens before another 
method call Co in a history H, written Cy <y C2 if the response of Cı precedes 
the invocation of Co in H. When the choice of H is clear from the context, we 
drop the H subscript from <. 
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A history H is linearizable under a sequential specification S if there exists a 
linearization (in the order-theoretic sense) of < y that belongs to S. The subtlety 
is the treatment of incomplete calls, which may or may not have taken effect. We 
write compl(#) for the set of histories obtained from a history H by appending 
zero or more matching return events. We write trunc( H) for the history obtained 
from H by removing its incomplete calls. We can then define linearizability as 
follows [14]. 


Definition 6. A sequential history He is a sequentialization of a history H if 
there exists H’ € trunc(compl(H)) such that He is a linearization of <p. A 
history H is linearizable under S if there exists a sequentialization of H that 
belongs to S. A library L is linearizable under S if all its consistent histories are 
linearizable under S. 


For instance, we can specify the notion of linearizable queues as those that 
linearizable under the following sequential queue specification, S@ueue- 


Example 2 (Sequential queue specification). The behaviors of a sequential queue, 
SQueue, iS expressed as a set of sequential histories as follows. Given a his- 
tory H of LQueue and a location x € Loc, let H[z] denote the sub-history 
containing calls c such that loc(c) = {a}. We define SQueue as the set of all 
sequential histories H of Leueue such that for all x € Loc, H[z] is of the form 
QueueNew(),,:% €1 -++ en, where each QueueDeg call in e -++ en returns the 
value of the k-th QueueEnq call, if it exists and precedes the QueueDeq, where k 
is the number of preceding QueueDeq calls returning non-null values; otherwise, 
it returns null. 


2.4 Adding Failures 


Our framework so far does not support reasoning about persistency as it lacks 
the ability to describe the persistent state of a library after a failure. Our first 
extension is thus to extend the set of events of a collection, Events(A), with 
another type of event, a crash event 4. 

Crash events allow us to specify the durability guarantees of a library. For 
instance, a library that does not persist any of its data may specify that a 
history with crash events is consistent if all of its sub-histories between crashes 
are (independently) consistent. In other words, in such a library, the method 
calls before a crash have no effect on the consistency of the history after the 
crash. We modify the definition of happens-before accordingly by treating it 
both as an invocation and a return event. We also assume that, after a crash, 
the thread ids of the new threads are distinct from that of all the pre-crash 
threads. For libraries that do persist their data, a useful generic specification is 
durable linearizability [17], defined as follows. 


Definition 7. Given a history H, let ops(H) denote the sub-history obtained 
from H by removing all its crash markers. A history H is durably linearizable 
under S if there exists a sequentialization H; of ops(H) such that H; € S. 
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Intuitively, this ensures that operations persist before they return, and they 
persist in the same order as they take effect before a crash. 

Although durable linearizability can specify a large range of persistent data- 
structures, it can be too strong. For example, consider a (memory) register li- 
brary Lwreg that only guarantees that writes to the same location are persisted 
in the order they are observed by concurrent reads. The Lwreg methods comprise 
RegNew() to allocate a new register, RegWrite(x, v) to write v to register x, and 
RegRead(x) to read from register x. The sequential specification Swreg is simple: 
once a register is allocated, a read R on x returns the latest value written to x, 
or 0 if R happens before all writes. The associated durable linearizability speci- 
fication requires that writes be persisted in the linearization order; however, this 
is often not the case on existing hardware, e.g. in Px86 (the Intel-x86 persistency 
model) [30]. 

A more relaxed and realistic specification would consider two linearizations 
of the events: the standard volatile order < and a persistent order nvo expressing 
the order in which events are persisted. The next sections will handle this more 
refined model, this paragraph only gives a quick tastes of the kind of models that 
are implemented by hardware. To capture the same-location guarantees, we stip- 
ulate a per-location ordering on writes that is respected by both linearizations. 
Specifically, we require an ordering mo of the write calls such that for all lo- 
cations x: 1) restricting mo to x, written Mog, totally orders writes to x; and 
2) mo, C< and mo, C nvo. Given a history H, we can then combine these two 
linearizations by using < after the last crash and nvo before. 

Formally, a history H with n—1 crashes can be decomposed into n (crash- 
free) eras; i.e. H = H,-4--- 4-H, where each H; is crash-free. Let us write <; 
for < M(H; x H;) and so forth. We then consider k-sequentializations of the form 
H} = HY tee He) . HP, where HP is a sequentialization of Ep w.r.t. <k 
and H 9 is a sequentialization of Æ; w.r.t. nvo;, for i < k. We can now specify 
our weak register library as follows, where H comprises n eras: 


H € Lwreg-Se <=> Vk < n. IHY k-seq. of H. HẸ € Swreg 


Example 3. The following history is valid according to this specification but not 
according to the durably linearizable one: 


Wa (x, 1): Wis (y, 1)- Reg (y) rett; (1)-Res (x) rett (0)- 4- Ri, (y) retr, (0): Ri, (x) -rett (1) 


While the writes to x (Wa (x,1)) and y (Wa (y, 1)) are executing, thread t3 
observes the new value (1) of y but the old value (0) of x; i.e. < must order 
W, (y, 1) before W, (x,1). By contrast, after the crash the new value (1) of x 
but the old value of y (0) is visible; i.e. nvo must order the two writes in the 
opposite order to < (W+ (x, 1) before Wy, (y, 1)). 


Persist Instructions. The persistent registers described above are too weak 
to be practical, as there is no way to control how writes to different locations are 
persisted. In realistic hardware models such as Px86, this control is afforded to 
the programmer using per-location persist instructions (e.g. CLFLUSH), ensuring 
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that all writes on a location x persist before a write-back on x. Here, we consider 
a coarser (stronger) variant, denoted by PFENCE, that ensures that all writes 
(on all locations) that happen before a PFENCE are persisted. Later in §3 we 
describe how to specify the behavior of per-location persist operations. 

Formally, we specify PFENCE by extending the specification of Lwreg as follows: 
given history H, write call c, and PFENCE event cy, if cy <y cy, then (Cw, cf) € 
nvo. 


Example 4. Consider the history obtained from example 3 by adding a PFENCE: 


Wi, (z, 1) ` Wi, (y, 1) . Reg (y) ` rett; (1) i Ritz (x) . Yetta (0) i PFENCE,, () ` rett, 0 . 4 i 
Reg (y) 7 rett, (0) j Ri, (x) ` retz, (1) 


This history is no longer consistent according to the extended specification of 
Lwreg: as PFENCE has completed (returned), all its <-previous writes must have 
persisted and thus must be visible after the crash (which is not the case for 


Wis (y, 1)). 


2.5 Adding Well-formedness Constraints 


Our next extension is to allow library specifications to constrain the usage of 
the library methods by the client of the library. For example, a library for a 
mutual exclusion lock may require that the “release lock” method is only called 
by a thread that previously acquired the lock and has not released it in between. 
Another example is a transactional library, which may require that transac- 
tional read and write methods are only called within transactions, i.e. between 
a “transaction-begin” and a “transaction-end” method call. 

We call such constraints library well-formedness constraints, and extend the 
library specifications with another component, Swt C Hist(L), which records 
the set of well-formed histories of the library. Ensuring that a program produces 
only well-formed histories of a certain library is an obligation of the clients of 
that library, so that the library implementation can rely upon well-formedness 
being satisfied. 


2.6 Tags and Global Specifications 


The goal of our framework is not only to specify libraries in isolation, but also to 
express how a library can enforce persistency guarantees across other libraries. 
For example, consider a library Lirans for persistent transactions, where all op- 
erations wrapped within a transaction persist together atomically; i.e. either all 
or none of the operations in a transaction persist. 

The Ltrans methods are: PTNewReg to allocate a register that can be accessed 
(read/written) within a transaction; PTBegin and PTEnd to start and end a trans- 
action, respectively; PTRead(x) and PTWrite(x, v) to read from and write to Ltrans 
register x, respectively; and PTRecover to restore the atomicity of transactions 
whose histories were interrupted by a crash. 
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Consider the snippet below, where the PEnq(q, 33) (enqueuing 33 into per- 
sistent queue q) and PSetAdd(s, 77) (adding 77 to persistent set s) are wrapped 
within an Ltrans transaction and thus should take effect atomically and at the 
latest after the end of the call to PTEnd. 


PT Begin(); 
PEna(q, 33); 
PSetAdd(s, 77); 

PTEnd(); 


Such guarantees are not offered by existing hardware primitives e.g. on Intel- 
x86 or ARMv8 [30,31] architectures. As such, to ensure atomicity, the persis- 
tent queue and set implementations cannot directly use hardware reads/writes; 
rather, they must use those provided by the transactional library whose imple- 
mentation could use e.g. an undo-log to provide atomicity. 

Our framework as described so far cannot express such cross-library persis- 
tency guarantees. The difficulty is that the transactional library relies on other 
libraries using certain primitives. This, however, is against the spirit of compo- 
sitional specification, which precludes the transactional library from referring to 
other libraries (e.g. the queue or set libraries). Specifically, there are two chal- 
lenges. First, both well-formedness requirements and consistency guarantees of 
Lirans Must apply to any method call that is designed to use (transitively) the 
primitives of Ltrans- Second, we must formally express atomicity (“all operations 
persist atomically”), without Lirans knowing what it means for a method of an 
arbitrary library to persist. In other words, Ltrans needs to introduce an abstract 
notion of ‘having persisted’ for an operation, and guarantee that all methods in 
a transaction ‘persist’ atomically. 

To remedy this, we introduce the notion of tags. Specifically, to address the 
first challenge, the transactional library provides the tag T to designate those 
operations that are ‘transaction-aware’ and as such must be used inside a trans- 
action. To address the second challenge, the transaction library provides the 
tag P", denoting an operation that has abstractly persisted. The specification 
of Ltrans then guarantees that all operations tagged with T inside a transaction 
persist atomically, in that either they are all tagged with P* of none of them 
are. Dually, using the well-formedness condition, Ltrans requires that all oper- 
ations tagged with T appear inside a transaction. Note that as the persistent 
queue and set libraries tag their operations with T, verifying their implementa- 
tions incurs related proof obligations; we will revisit this later when we formalize 
the notion of library implementations. 


Remark 1 (Why bespoke persistency?). The reader may question why ‘having 
persisted’ is not a primitive notion in our framework, as in an existing model of 
Px86 [19] where histories track the set P of persisted events. This is because asso- 
ciating a Boolean (‘having persisted’) flag with an operation may not be sufficient 
to describe whether it has persisted. To see this, consider a library Lpair with op- 
erations Write(x,1,r) (writing (l,r) to pair x), ReadI(x) and Readr(a) (reading 
the left and right components of x, respectively). Suppose Lpair is implemented 
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by storing the left component in an Liang register and the right component in 
a Lwreg register. The specification of Lpair would need to track the persistence of 
each component separately, and hence a single set P of persisted events would 
not suffice. 


Let us see how libraries can use these tags in global well-formedness and 
consistency specifications. The dilemma is, on the one hand, the specification 
of Lirans needs to refer to events from other libraries, but on the other hand, it 
should not depend on other libraries to preserve encapsulation. Our idea is to 
anonymize these external events such that the global specification depends only 
on their relevant tags. A library should only rely on the tags it introduces itself, 
as well as the tags of the libraries it uses. 

We now revisit several of our definitions to account for tags and global spec- 
ifications. A library interface now additionally holds the tags it introduces as 
well as those it uses. For instance, the Ltrans library described above depends on 
no tag and introduces tags T and P™. 


Definition 8 (Interfaces). An interface is a tuple L = (M, Me, loc, TAGSnew, 
TAGSdep), where M, Me, and loc are as in Def. 1, TAGSnew is the set of tags 
L introduces, and TAGSaep is the set of tags L uses. The set of tags usable by L 
is TaGs(L)  L.TAGSnew U L.TAGSdep- 


We next define the notion of tagged method invocations (where a method in- 
vocation is associated with a set of tags). Hereafter, our notions of events, history 
(and so forth) use tagged method invocations (rather than methods invocations). 


Definition 9. Given a library interface L, a tagged method invocation is of the 
form m(v)?:v1, where the new component is a set of tags T C Taas(L). 


A global specification of a library interface L is a set of histories with some 
“anonymized” events. These are formalized using a designated library interface, 
xz (with a single method x), which can be tagged with any tag from Tacs(L). 


Definition 10. Given an interface L, the interface xz is ({x},0,0,0, Tacs(L)). 


Now, given any history H € Hist({L} U A), let m,(H) € Hist({L,«,}) denote 
the anonymization of H such that each non-L event e in H labelled with a 
method m(v)7: vu, of L’ € A is replaced with x? of xz if T #0 and is discarded 
otherwise. It is then straightforward to extend the notion of libraries with global 
specifications as follows. 


Definition 11. A library specification L is a tuple (L, Atags,Sc, Sw, Te, Twi); 
where L, Se and Sywr are as in Def. 4; Te and Twr C Hist({L,*,}) are the globally 
consistent and globally well-formed histories, respectively; and Atags denotes the 
tag-dependencies, i.e. a collection of libraries that provide all tags that L uses: 
L.TAGSdep © Ube Atees L’.TAGSnew. Both Twr and Te contain the empty history. 


In the context of a history, we write |T| for the set of events or calls tagged 
with the tag T (we consider a return event tagged the same way as its unique 
matching invocation). 
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For the Lirans library, the globally well-formed set Lirans-Twe Comprises histo- 
ries H such that for each thread t, Eft] restricted to PTBegin, PTEnd and events 
of the form T-tagged events is of the form described by the regular expression 
(PTBegin.|T|*.PTEnd)*. In particular, transaction nesting is disallowed in our 
simple Ltrans library. 

To define global consistency, we need to know when two operations are part 
of the same transaction. Given a history H, we define the same-transaction 
relation, strans, relating pairs of e,e’ € |T| U PTEnd U PT Begin executed by the 
same thread t such that there is no PTBegin or PTEnd executed by t between 
them. The set Ltrans-Je of globally consistent histories contains histories H such 
that V(e,e’) € strans,e € [P| = e’ € |P"], and all completed PTEnd calls are 
in |P*"]. Since the PTEnd call is related to all events inside its transaction, this 
specification does express that (1) a transaction persist by the time the call to 
PTEnd finishes and (2) all events persist atomically. 

Finally, we need to define the local consistency predicate Ltrans-S¢ describing 
the behavior of the registers provided by Ltrans. This is where the we define the 
concrete meaning of ‘having persisted’ for these registers. Let S' be the sequential 
specification of a register. Let H € Hist(Lirans) be a history decomposed into k 
eras as Hı -4 - H2 -4 ----4 - Hpg. Then H €E Ltrans-Se iff all events are tagged 
with T, and there exists a <-linearization He of ( (Hi- 4- Ha-4----4 - Hk-1)N 
|p|) - Hy such that Hy € S, where |P*"] is the set of events of H tagged 
with P*". In other words, a write operation is seen after a crash iff it has persisted. 
The requirement that such operations must appear within transactions and the 
guarantee that they persist at the same time in a transaction are covered by the 
global specifications. 


2.7 Library Implementations 


We have described how to specify persistent libraries in our framework, and 
next describe how to implement persistent libraries. This is formalized by the 
judgment A+ I : L, stating that I is a correct implementation of library L 
and only uses calls in the collection of libraries A. As usual in such ‘layered’ 
frameworks [13,26], the base layer, which represents the primitives of the hard- 
ware, is specified as a library, keeping the framework uniform. This judgement 
can be composed vertically as follows, where I[I;] denotes replacing the calls 
to library L in J with their implementations given by Iz (which in turn calls 
libraries A’): 


ALFI: AUF I,:k 
AA FTG) 


As we describe later, this judgment denotes contextual refinement and is im- 
practical to prove directly. We define a stronger notion that is compositional 
and more practical to use. 


Definition 12 (Implementation). Given a collection A of libraries and a li- 
brary L, an implementation I of L over A is a map, I : L.M x Val, —> 
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globals log := Q.new() method PTRecover() := 
method PTNewReg() := alloc(1) let w = Q.new() in 
method PTRead(I) := read(l) while (x := Q.pop(log)) 
method PTWrite(I, v) := if (x = COMMITTED) 
Q.append(log, (1, v)); w = Q.new(); 
write(I, v) else 
method PT Begin() := FENCE(); Q.append(w, x); 
method PTEnd() := while ((1, v) = Q.pop(log)) { 
Append(log, COMMITTED); write(I, v); } 
FENCE() 


Fig. 1. Implementation of Ltrans 


P(Hist(A)), such that it is downward-closed: 1) if H € I(m(vh, v1) and H’ is a 
prefix of H, then H’ € I(m(v), L); and 2) each I(m(v):v1) history only contain 
events by thread t. 


Intuitively, I(m(v), v1) contains the histories corresponding to a call m(v) with 
outcome v], where v} = L denotes that the call has not terminated yet and 
vı =v € Val denotes the return value. Downward-closure means that an imple- 
mentation contains all partial histories. We use a concrete programming language 
to write these implementations; its syntax and semantics are standard and given 
in the appendix [34]. 

For example, the implementation of Ltrans over Lwreg and LQueue is given in 
Fig. 1. The idea is to keep an undo-log as a persistent queue that tracks the 
values of the variables before the transaction begins. At the end of a transaction, 
and after all its writes have persisted, we write the sentinel value COMMITTED to 
the log to indicate that the transaction was completed successfully. After a crash, 
the recovery routine PTRecover returns the undo-log and undoes the operations 
of incomplete transactions by writing their previous values. 


Histories and Implementations. An implementation J of L over A is 
correct if for all histories H € Hist({L} U A’) that use library L as well as 
those in A’, and all histories H’ obtained by replacing calls to L methods with 
their implementation in I, if H’ is consistent, then so is H (it satisfies the L 
specification). 

We define the action H-I of an implementation J on an abstract history H in 
a ‘relational’ way: H’ € H-I when we can match each operation m’(v) in H’ with 
some operation f(m'(v)) in H in such a way that the collection f~'(m(v)t:v1) 
of operations corresponding to some call m(v),:u, in H agrees with I(m/(v)1:v_). 


Definition 13. Let I be an implementation of L over A; let H € Hist({L}U A’) 
and H' € Hist(A U A’) be two histories. Given a map f : {1,...,|H’|} > 
{1,...,|H|}, H’ (I, f)-matches H if the following hold: 


1. f is surjective; 
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2. for all invocations of H, if m(v), ¢ L.M, then f(m(v).) = m(v); 
3. for all threads t, if e1 precedes ez in H'[t], then f(e1) precedes f(e2) in Hft]; 


4. for all calls m(v).:v1 of H, the set f~'(m/(v),) corresponds to a substring Hi, 
of H'|t] and H}, € I(m(w)::v1L), where vı is the (optional) return value 
of m(v), in H. 


The action of I on a history H is defined as follows: 


H-I={H'| 3f. H’ (I, f)-matches H}. 


Condition 1 ensures that all events of the abstract history are matched with an 
implementation event; condition 2 ensures that the events that do not belong to 
the library being implemented (L) are left untouched, and condition 3 ensures 
that the thread-local order of events in the implementation agrees with the one in 
the specification. The last condition (4) states that the events corresponding to 
the implementation of a call m(v) are consecutive in the history of the executing 
thread t, and correspond to the implementation T. 


Well-formedness and Consistency. Recall that libraries specify both how 
they should be used (well-formedness), and what they guarantee if used cor- 
rectly (consistency). Using these specifications (expressed as sets of histories) 
to define implementation correctness is more subtle than one might expect. 
Specifically, if we view a program using a library L as a downward-closed set 
of histories in Hist(L), we cannot assume all its histories are in the set L.Swf of 
well-formed histories, as the semantics of the program will contain unreachable 
traces (see [26]). To formalize reachability at a semantic level, we define heredi- 
tary consistency, stating that each step in the history was consistent, and thus 
the current ‘state’ is reachable. 


Definition 14 (Consistency). History H € Hist(A) is consistent if for all LE 
A, H|L] € L.Se and m(H)€L.Te. It is hereditarily consistent if all H[1..k] are 
consistent, for k < |H]. 


This definition uses the ‘anonymization’ operator m, defined in §2.6 to test that 
the history H follows the global consistency predicates of every L € A. 

We further require that programs using libraries respect encapsulation, de- 
fined below, stating that locations obtained from a library constructor are only 
used by that library instance. Specifically, the first condition ensures that dis- 
tinct constructor calls return distinct locations. The second condition ensures 
that a non-constructor call e of L uses locations that have been allocated by an 
earlier call c (c < e) to an L constructor. 


Definition 15 (Encapsulation). A history H € Hist(A) is encapsulated if 
the following hold, where C denotes the set of calls to constructors in H: 


1. for allec, € C, ife#¢, then loc(c) Nloc(c’) = 0; 
2. for alle € H\ C, if loc(e) # 0, then there exist c € C, L € A such that 
e,c E L.M, c < e and loc(e) C loc(c). 
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We can now define when a history of A is immediately well-formed: it must 
be encapsulated and be well-formed according to each library in A and all the 
tags it uses. 


Definition 16. History H € Hist(A) is immediately well-formed if the follow- 
ing hold: 


1. H is encapsulated; 

2. H|L] € L.Sys, for all L € A; and 

3. m(H) € L.Twe for all L € TagDep(A), where the immediate dependen- 
cies TagDep(A) is defined as U e,{L} U Atags(L). 


We finally have the notions required to define a correct implementation. 


Implementation Correctness. As usual, an implementation is correct if 
all behaviors of the implementation are allowed by the specification. In our 
setting, this means that if a concrete history is hereditarily consistent, so should 
the abstract history. Moreover, assuming the abstract history is well-formed, all 
corresponding concrete histories should also be well-formed; this corresponds to 
the requirement that the library implementation uses its dependencies correctly, 
under the assumption that the program itself uses its libraries correctly. 


Definition 17 (Correct implementation). An implementation I of L over A 
is correct, written AF I : L, if for all collections A’, all ‘abstract’ histories H € 
Hist({L} U A’) and all ‘concrete’ histories H' € H-I C Hist(AU A’), the 
following hold: 


1. if H is immediately well-formed, then H’ is also immediately well-formed; 
and 

2. if H’ is immediately well-formed and hereditarily consistent, then H is con- 
sistent. 


This definition is similar to contextual refinement in that it quantifies over all 
contexts: it considers histories that use arbitrary libraries as well as those that 
concern I directly. We now present a more convenient, compositional method for 
proving an implementation correct, which allows one to only consider libraries 
and tags that are used by the implemented library. 


2.8 Compositionally Proving Implementation Correctness 


Recall that in this section we present our framework in a simplified sequentially 
consistent setting; later in 83 we generalize our framework to the weak mem- 
ory setting. We introduce the notion of compositional correctness, simplifying 
the global correctness conditions in Def. 17. Specifically, while Def. 17 considers 
histories with arbitrary libraries that may use tags introduced by L, our com- 
positional condition requires one to prove that only those L methods that are 
L-tagged satisfy L.7e. 
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Definition 18 (Compositional correctness). An implementation I of L 
over A is compositionally correct if the following hold: 


1. For all A’, H € Hist({L} UA) and H' € H -I C Hist(AU A’), if H' is 
well-formed, then H is well-formed; 

2. For all H € Hist(L) and H’ € H -I C Hist(A), if H' is well-formed and 
hereditarily consistent, then H € L.S N L.Te; and 

3. For all U € A, H € Hist({L, L’, xu }) and H' € H - I, if nu (H') € UT 
L’.T., then m(H) € L’.Te. 


The preservation of well-formedness (condition 1) does not change compared to 
its counterpart in Def. 17, as in practice this condition is easy to prove directly. 
Condition 2 requires one to prove that the implementation is correct in isolation 
(without A’). Condition 3 requires one to prove that global consistency require- 
ments are maintained for all dependencies of the implementation. In practice, 
this corresponds to proving that those L operations tagged with existing tags 
in A obey the global specifications associated with these tags. Intuitively, the 
onus is on the library that uses a tag for its methods to prove the associated 
global consistency predicate: we need not consider unknown methods tagged 
with tags in L.TAGSnew. 

Finally, we show that it is sufficient to show an implementation I is compo- 
sitionally correct as it implies that J is correct. 


Theorem 1 (Correctness). If an implementation I of L over A is composi- 
tionally correct (Def. 18), then it is also correct (Def. 17). 


Example 5 (Transactional Library Ltrans). Consider the implementation Frans 
of Ltrans over A = {Lwreg, Lqueue} given in Fig. 1, and let us assume we were 
to show that Frans is compositionally correct. Our aim here is only to outline the 
proof obligations that must be discharged; later in §5 we give a full proof in the 
more general weak memory setting. 


1. For the first condition of compositional correctness, we must show Itrans 
preserves well-formedness: if the abstract history H is well-formed, then so 
is any corresponding concrete history H’ € H - Irans. This is straightforward 
as the well-formedness conditions of Lwreg and Laueue are trivial, and Ltrans 
does not use any existing tag. 

2. For the second condition of compositional correctness, we must show that 
Itrans Preserves consistency in the other direction: keeping the notations as 
above, assuming H’ is consistent for A, then H is consistent as specified 
by Ltrans- There are two parts to this obligation, as we also have to show that 
the Ltrans’s operations tagged with T satisfy the global consistency predicate 
of the library. 

3. The last condition holds vacuously as Lirans does not use any existing tags. 


Example 6 (A Client of Ltans). To see how the global consistency specifications 
work, consider a simple min-max counter library, Liment, tracking the maxi- 
mal and minimal integer it has been given. The Lmment is to be used within 
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method mmNew() := method mmMin(x) := 
(PTNewReg(), PTNewReg()) PTRead(x.1) 

method mmAdd(x, n) := method mmMax(x) := 
PTWrite(min(n, PTRead(x.1))) PTRead(x.2) 


PTWrite(max(n, PTRead(x.2))) 


Fig. 2. Implementation Imment Of Lmment 


Lirans transactions, and provides four methods: mmNew() to construct a min-max 
counter, mmAdd(a,n), to add integer n to the min-max counter, and mmMin(x) 
and mmMax(x) to read the respective values. 


We present the Imment implementation over Lirans in Fig. 2. The idea is sim- 
ply to track two integers denoting the minimal and maximal values of the num- 
bers that have been added. Interestingly, even though they are stored in Ltrans 
registers, the implementation does not begin or end transactions: this is the re- 
sponsibility of the client to avoid nesting transactions. This is enforced by Lmment 
using a global well-formedness predicate. Moreover, the mmAdd operation is tagged 
with T from the Lirans library, ensuring that it behaves well w.r.t. transactions. 
A non-example is a version of Imment Where the minimum is in a Ltrans register, 
but the max is in a “normal” Lwreg register. This breaks the atomicity guarantee 
of transactions. 


Formally, the interface Lmment has four methods as above, where mmNew is 
the only constructor. The set of used tags is TAGSgep = {T,P™}, and all Lmment 
methods are tagged with T as they all use primitives from Ltrans. The consis- 
tency predicate is defined using the obvious sequential specification Smment, which 
states that calls to mmMin return the minimum of all integers previously given 
to mmAdd in the sequential history. We lift this to (concurrent) histories as follows. 
A history H € Hist(Linment) is in Limment-Sc if there exists Ee E€ Smment that is a 
<-linearization of Æ; [Pp] - E2[p™]---E,_1-E,[P"], where H constructs n eras 
decomposed as H = FE, - 4 ---4 - En (recall that E[P™] denotes the sub-history 
with events tagged with P", that is, persisted events.). The global specification 
and well-formedness conditions of Lmment are trivial. Because Lmment uses tag T 
of Ltrans, a well-formed history of Limment must satisfy Ltrans-Jwf, which requires 
that all operations tagged with T be inside transactions, and Lirans- 7e guarantees 
that Lmment Operations persist atomically in a transaction. 


When proving that the implementation in Figure 2 satisfies Lmment using 
compositional correctness, one proof obligation is to show that, given histories 
H € Hist({Lerans, Limments * Liane t) and H’ € H - Imment C Hist ({Lirans; *tivane }), if 
Thuas (H') E Lirans- Te, then nepa (H) E Ltrans-Jc. This corresponds precisely to 
the fact that min-max counter operations persist atomically in a transaction, 
assuming the primitives it uses do as well. 
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2.9 Generic Durable Persistency Theorems 


We consider another family of libraries with persistent reads/writes guaranteeing 
the following: 


if one replaces regular (volatile) reads/writes in a linearizable implemen- 
tation with persistent ones, then the implementation obtained is durably 
linearizable. 


We consider two such such libraries: FliT [35] and Mirror [10]. Thanks to our 
framework, we formalise the statement above for the first time and prove it for 
both Flit and Mirror against a realistic consistency (concurrency) model (see 


$4). 


3 Generalization to weak-memory 


This section sketches how we generalize the framework presented in the previous 
section to the weak memory, where events generated by the program are not 
totally ordered. For lack of space, the technical details, which largely follow that 
of the previous section, are relegated to the Appendix [34]. The purpose of this 
section is to give an idea of how executions, a standard tool in the semantics of 
weak memory, generalize the histories we used in the Overview section, and to 
give enough context for the case studies that follow. 

Unlike the histories that we discussed 
in the previous section, in which events 
are totally ordered by a notion of time, linit] 
events in executions are only partially or- i 
dered, reflecting that instructions executed Lo R> 
in parallel are not naturally ordered. For- R(z):5 ; R(y):0 
mally, an execution is thus a set of events pol om {po 
equipped with a partial order which repre- W(y,2):0 W(2,5):() 
sents the ordering between events from the 
same thread. This partial order, written po, Fig. 3. An execution of the program 
for program-order, is depicted with black p; 
arrows in Fig. 3, where it orders minimally a= z; y=2 la=y;2=5 
the initial event, and the two events of each 
thread according to the source code. Addi- 
tional edges indicate, for each read-event returning the value v, the write-event 
that provided the value v: in that case, an rf-edge from the write-event to the 
read-event is added to the execution. 

To be able to reason about synchronization, the notion of happens-before 
needs to be adapted to this setting. It is defined using po and an additional 
type of edge, synchronizes-with, written sw, which denotes that two events syn- 
chronize with each other, and in particular that one happens before the other. 
Usually, sw C rf, for example between a release-write and an acquire-read in 
the C11 memory model. Given these sw edges, the happens-before order they 
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induce, which generalizes < from the previous section is defined as the transitive 
closure (po U sw). This is not sufficient however, because we consider partial 
executions G where the focus is on a subset of the libraries in some unknown 
global execution G’, that is: G = G” | L. Therefore, external events (in G” but 
not in G) may induce happens-before relations between events of G, yet we want 
to specify library L without referring to any such execution G” that contains it. 
To solve this issue, we use the technique of [26], and we add a final type of edge 
to executions: hb, which corresponds to both the external and the internal syn- 
chronization. Because of the latter, it must contain the internal synchronization: 
po U sw C hb. 

To summarize, an execution is a tuple (E, po, rf, sw, hb) comprised of a set Æ 
of events, and of the relations we just described. A library specification is the 
same as in the previous section, mutatis mutandis. The sets of executions that 
are parts of specifications are defined using a formalism developed in the weak 
memory model literature. A set S of executions is described with conditions 
about relations built from po, rf, etc. Given a set V of events, we denote by 
[V] the relation V x V, and we denote by R1; Rə the standard composition of 
two relations Rı and Rp. For example, if R denotes the set of read-events of an 
execution and W the set of write-events, the condition [W]; rf; [R] C sw states 
that if there is a rf-edges between two events e; € W and e2 € R of an execution, 
there must also be a sw synchronization edge between e; and e2. 

As in the previous section, the tag system allows the library specification to 
state which events must have been persisted in a valid execution. The semantics 
of a program is a set of executions that contain events from all the libraries used 
by the program; and whose happens-before order satisfy hb = (po U sw)*, as 
there are no external synchronization in the executions of the whole program. 
The Appendix [34] details how our framework is defined in this more general 
setting. 


4 Case Study: Durable Linearizability with FliT and 
Mirror 


We consider a family of libraries that provide a simple interface with persistent 
memory accesses (reads and writes), allowing one to convert any linearisable 
implementation to a durably linearisable one by replacing regular (volatile) ac- 
cesses with persistent ones supplied by the library. Specifically, we consider two 
such libraries FliT [35] and Mirror [10]; we specify them both in our framework, 
prove their implementations sound against their respective specifications, and 
further prove their general result for converting data structures. 


4.1 The FliT Library 


FliT [35] is a persistent library that provides a simple interface very close to 
Px86, but with stronger persistency guarantees, which make it easier to imple- 
ment durable data structures. Specifically, a FliT object @ can be accessed via 
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method wr-(é, v): method a. (2): 


j REETA counter(£), 1); local y = feadté); 
write(£, v); j ue if m = pA flit-counter(£) > 0 then 
S d flushopt (£); 
flushop: (£); return v; 
fetch-and-add(flit-counter(£), —1); , 
eine method finishOp : 
sfence; meee. 
write(@, v); ; 


Fig. 4. FliT library implementation in Px86 


write and read methods, wr,(@,v) and rd,(@), as well as standard read-modify- 
write methods. Each write (resp. read) operation has two variants, denoted by 
the type m € {p,v}. This type specifies if the write (resp. read) is persistent 
(m = p) in that its effects must be persisted, or volatile (m = v) in that its 
persistency has been optimised and offers weaker guarantees. The default access 
type is persistent (p), and the volatile accesses may be used as optimizations 
when weaker guarantees suffice. Wei et al. [35] introduce a notion of dependency 
between different operations as follows. If a (persistent or volatile) write w de- 
pends on a persistent write w’, then w’ persists before w. If a persistent read 
r reads from a persistent write w, then r depends on w and thus w must be 
persisted upon reading if it has not already persisted. Though simple, FliT pro- 
vides a strong guarantee as captured by a general result for correctly converting 
volatile data structures to persistent ones: if one replaces every memory access 
in the implementation of a linearizable data-structure with the corresponding 
persistent FliT access, then the resulting data structure is durably linearizable. 

Compared to the original FliT development, our soundness proof is more 
formal and detailed: it is established against a formal specification (rather than 
an English description) and with respect to the formal Px86 model. 


FliT Interface. The FIiT interface uses the pP% from Px86 and contains 
a single constructor, new, allocating a new FIiT location, as well as three other 
methods below, the last two of which are durable: 


— rd,(¢) with rE {p,v}, for a m-read from £; 
— wr,(¢,v) with tE {p, v}, denoting a 2-write of value v € Val to ¢; and 
— finishOp, which waits for previously executed operations to persist. 


We write R and W respectively for the read and write events, and add the 
superscript m (e.g. RP) to denote such events with the given persistency mode. 


FliT Specification. We develop a formal specification of FliT in our frame- 
work, based on its original informal description. The correctness of FliT execu- 
tions is described via a dependency relation that contains the program order and 
the total execution (linearization) order restricted to persistent write-read oper- 
ations on the same location. Note that this dependency notion is stronger than 
the customary definitions that use a rf relation (as in the Px86 specification) 
instead of lin, because a persistent read may not read directly from a persistent 
write w, but rather from another later (lin-after w) write. 
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Definition 19 (FliT execution Correctness). A FliT execution G is correct 
if there exists a ‘reads-from’ relation rf and a total order lin D G.hb on G.E and 
an order nvo such that: 


1. Each read event reads from the most recent previous write to the same loca- 
tion: 
rf = Useroe (Wel; lin; [Re]) \ (lin; [We]; lin) 
2. Reads return the value written by the write they read from: 
(w,r) E€ rf > X, n,n’, v. lab(r) = raw (£) : v A lab(w) = wrr (l, v) : — 
3. Persistent writes persist before every other later dependent write: 
[W"]; (po U Ugeroc [We]; lin; [Rp] t; [W] € nvo 
4. Persistent writes before a finishOp persist: 
dom([W®]; (po U UseroelW7]; lin; [Rg] T; [Finishop]) € [P| 
5. And nvo is a persist order: dom(nvo; | Ptag|) C |Ptag]. 


Px86 implementation of FliT. The implementation of FliT methods is 
given in Fig. 4. Whereas a naive implementation of this interface would have to 
issue a flush instruction both after persistent writes and in persistent reads, the 
implementation shown associates each location with a counter to avoid perform- 
ing superfluous flushes when reading from a location whose value has already 
persisted. Specifically, a persistent write on £ increments its counter before writ- 
ing to and flushing it, and decrements the counter afterwards. As such, persistent 
reads only need to issue a flush if the counter is positive (i.e. if there is a con- 
current write that has not executed its flush yet). 


Theorem 2. The implementation of FliT in Fig. 4 is correct. 


FliT and Durable Linearizability. Given a data structure implementa- 
tion I, let p(T) denote the implementation obtained from J by 1) replacing 
reads/writes in the implementation with their corresponding persistent FliT in- 
structions, and 2) adding a call to finishOp right before the end of each method. 
We then show that given an implementation J, if I is linearizable, then p(J) is 
durably linearizable?. We assume that all method implementations are single- 
threaded, i.e. all plain executions I(m(v)) are totally ordered. 


Theorem 3. If Px86F I: Lin(S), then FliT — p(T) : DurLin(S). 


4.2 The Mirror Library 


The Mirror [10] persistent library has similar goals to FliT. The main difference 
between the two is that Mirror operations do not offer two variants, and their 
operations are implemented differently from those of FliT. Specifically, in Mir- 
ror each location has two copies: one in persistent memory to ensure durability, 


3 The definition here is the same as in §2, as hb-linearizations of the execution still 
yield sequential executions. 
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and one in volatile memory for fast access. As such, read operations are imple- 
mented as simple loads from volatile memory, while writes have a more involved 
implementation than those of FIiT. 

We present the Mirror specification and implementation in the technical ap- 
pendix where we also prove that its implementation is correct against its spec- 
ification. As with FliT, we further prove that Mirror can be used to convert 
linearizable data structures to durably linearizable ones, as described above. 


5 Case Study: Persistent Transactional Library 


We revisit the Lirans transactional library, develop its formal specification and 
verify its implementation (Fig. 1) against it. Recall the simple Ltrans implemen- 
tation in Fig. 1 and that we do not allow for nested transactions. The implemen- 
tation uses an undo-log which records the former values of persistent registers 
(locations) modified in a transaction. If, after a crash, the recovery mechanism 
detects a partially persisted transaction (i.e. the last entry in the undo log is not 
COMMITTED), then it can use the undo-log to restore registers to their former 
values. The implementation uses a durably linearizable queue library Q, and 
assumes that it is externally synchronized: the user is responsible for ensuring 
no two transactions are executed in parallel. We formalize this using a global 
well-formedness condition. 

Later in 85.2 we develop a wrapper library Lstrans for Lirans that additionally 
provides synchronization using locks and prove that our implementation of this 
library is correct. To do this, we need to make small modifications to the structure 
of the specification: the specification in §2 requires that any ‘transaction-aware 
operation’ (i.e. those tagged with T) be enclosed in calls to PTBegin and PTEnd. 
Since Lstrans Wraps the calls to PTBegin and PTEnd, the well-formedness condition 
needs to be generalized to allow operations tagged with T to appear between 
calls to operations that behave like PTBegin and PTEnd. To that end, we add two 
new tags B and E to denote such operations, respectively. 


5.1 Specification 


The Lirans library provides four tags: 1) T for transaction-aware ‘client’ opera- 
tions; 2) P" for operations that have persisted using transactions; and 3) B, E for 
operations that begin and end transactions, respectively. We write R, W,B,E,RC 
respectively for the sets of events labeled with read, write, begin, end and recov- 
ery methods. As before, we write e.g. |T] for the set of events tagged with T. Note 
that while 6 denotes the set of the begin events in library Ltrans, the |B] denotes 
the set of all events that are tagged with B, which includes B (of library Lirans) as 
well as events of other (non-Ltrans) libraries that may be tagged with B; similarly 
for € and |E]. As such, our local specifications below (i.e. local well-formedness 


4 For example, take any linearizable queue implementation and use the FliT library as 
described in 84. 
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and consistency) are defined in terms of B and €, whereas our global specifica- 
tions are defined in terms of |B| and |E]. As before, for brevity we write e.g. 
[T] as a shorthand for the relation [|T|]. We next define the ‘same-transaction’ 
relation strans: 


strans £ [[B]JU[E]U[T]]; (poUpo™"); [LB]JULE]ULT]] \ ((po; [E]; po)U(po; [B]; po)) 
An execution is locally well-formed iff the following hold: 


1. A transaction must be opened before it is closed: E C rng([B]; po) 

2. Transactions are not nested and are matching: [E]; po; [E] C [E]; po; [B]; po; [E] 
and [B]; po; [B] € [B]; po; [E]; po; [8] 

3. Transactions must be externally synchronized: € x B C hb U hb! 

4. The recovery routine must be called after each crash before using the library: 
4; hb; |B] C 4; hb; [RC]; hb; |B] 

5. Events are correctly tagged: WUR C |T| 


An execution is globally well-formed if client operations are inside transactions: 


6. |T] C rng([B]; po) 
7. [E]; po; [T] € [E]; po; [B]; po; [T] 


An execution is locally-consistent if there exists a relation rf satisfying: 


8. rf relates writes to reads, rf C W x R, such that each read is related to 
exactly one write (i.e. rf~! is total and functional). 
9. Reads access the most recent write: rf—!; hb C hb 
10. External reads (reading from a different transaction) read from persisted 
writes: dom(rf \ strans) C |P™| 


An execution is globally-consistent if there exists an order nvo over |T| satisfying: 


11. Transactions are nvo-ordered: [E]; hb; [B] C nvo 

12. nvo is the persistance order: dom(nvo; [P™]) C | P™]; 

13. Either all the events or none of the events in a transaction persist (atomicity): 
[P"]; strans; [r] € [P*] 

14. All events of a completed transaction (ones with an associated end event) 
persist: [E|© C |P*"], where |E|° denotes the set of method calls tagged 
with E which have completed. 


Theorem 4. The Lirans implementation in Fig. 1 over Px86 is correct. 


5.2 Vertical Library Composition: Adding Internal Synchronization 


We next demonstrate how our framework can be used for vertical library compo- 
sition, where an implementation of one library comprises calls to other libraries 
with non-trivial global specifications. To this end, we develop Lstrans, a Wrapper 
library around Ltrans that is meant to be simpler to use by providing synchro- 
nization internally: rather than the user ensuring synchronization for Ltrans, one 
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can use Lstrans to prevent two transactions from executing in parallel. More for- 
mally, the well-formedness condition (3) of Ltrans becomes a correctness guarantee 
of Lstrans- We consider a simple implementation of Lstrans that uses a global lock 
acquired at the beginning of each transaction and released at the end as shown 
below. 


globals lock := L.new() method LPTBegin() := L.acq(lock);PT Begin() 
method LPTEnd() := PTEnd();L.rel(lock) 


Theorem 5. The implementation of Lstrans above is correct. 


Using compositional correctness, the main proof obligation is the condition stip- 
ulating that the implementation be well-formed, ensuring that Ltrans is used 
correctly by the Lstrans implementation. This is straightforward as we can as- 
sume there exists an immediate prefix that is consistent. The existence of the 
hb-ordering of calls to PTBegin and PTEnd follows from the consistency of the 
global lock used by the implementation. 


5.3 Horizontal Library Composition 


We next demonstrate how our framework can be used for horizontal library 
composition, where a client program comprises calls to multiple libraries. To 
this end, we develop a simple library, Lentr, providing a persistent counter to be 
used in sequential (single-threaded) settings: If a client uses Lentr in concurrent 
settings, it must call its methods within critical sections. The Lentr provides three 
operations to create (NewCounter), increment (CounterInc) and read a counter 
(CounterRead). The specification and implementation of Lentr are given in [34] 

As Lentr uses the tags of Ltrans, we define Lentr-Atags * {Lerans}. The all the 
operations are tagged with T. As such, Lent inherits the global well-formedness 
condition of Ltrans, meaning that Lent, operations must be used within transac- 
tions (i.e. hb-between operations respectively tagged with B and E). Putting it 
all together, the following client code snippet uses Lentr in a correct way, even 
though Lentr has no knowledge of the existence of Lstrans- 


c = NewCounter(); LPTBegin(); Counterlnc(c); Counterlnc(c); LPTEnd(); 


Specifically, the above is an instance of horizontal library composition (as the 
client comprises calls to both Lstrans and Lentr), facilitated in our framework 
through global specifications. 


6 Conclusions, Related and Future Work 


We presented a framework for specifying and verifying persistent libraries, and 
demonstrated its utility and generality by encoding existing correctness notions 
within it and proving the correctness of the FliT and Mirror libraries, as well as 
a persistent transactional library. 
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Related Work. The most closely related body of work to ours is [26]. How- 
ever, while their framework can be used for specifying only the consistency guar- 
antees of a library, ours can be used to specify both consistency and persistency 
guarantees. More generally, our tag system extends the expressivity of [26] with 
support for global effects such as some types of fences. 

Existing literature includes several works on formal persistency models, both 
for hardware [25,30,31,5,6,19,29,28] and software [4,21,11], as well as correct- 
ness conditions for persistent libraries such as durable linearizability [17]. As we 
showed in §3, such models can be specified in our framework. 

There have been works [33] to specify libraries using an operational approach 
instead of the declarative approach that we advocate for here. While it is not 
generic in the memory model, it support weak memory, with a fragment of the 
C++ 11 memory model, and supports synchronization that is internal and ex- 
ternal to the library. Another framework for formalizing behavior of concurrent 
objects in the presence of weak memory is [18], which is more syntactic as our 
framework: they use a process calculus, which allows them to handle callbacks 
between the library and the client. Extending our framework, which is more 
semantic, to handle this setting would probably require shifting from execution- 
s/histories to something similar to game semantics. 

Additionally, there are several works on implementing and verifying algo- 
rithms that operate on NVM. [9] and [36] respectively developed persistent queue 
and set implementations in Px86. [8] provided a formal correctness proof of the 
implementation in [36]. All three of [8,36,9] assume that the underlying concur- 
rency model is SC [23], rather than that of Px86 (namely TSO). As we demon- 
strated in §4-§5 we can use our framework to verify persistent implementations 
modularly while remaining faithful to the underlying concurrency model. [27,2] 
have developed persistent program logics for verifying programs under Px86. [20] 
recently formalized the consistency and persistency semantics of the Linux ext4 
file system, and developed a model-checking algorithm and tool for verifying the 
consistency and persistency behaviors of ext4 applications such as text editors. 

Recently, and independently to this work, Bodenmiiller et al [3] have proved 
the correctness of the Flit library under TSO. They used an operational ap- 
proach, and modeled the libraries and the memory and persistency models oper- 
ationally using automata, and proved a simulation result using KIV a specialized 
proof assistant. As for this paper, they proved that a linearizable library using 
Flit becomes durably linearizable. 


Future Work. We believe our framework will pave the way for further work 
on verifying persistent libraries, whether manually (as done here), possibly with 
the assistance of an interactive theorem prover and/or program logics such as 
those of [7,27,2], or automatically via model checking. The work of [7] uses the 
framework of [26] to specify data structures in a program logic, and it would be 
natural to extend it to our framework for persistency. Existing work in the latter 
research direction, e.g. [12,20], has so far only considered low-level properties, 
such as the absence of races or the preservation of user-supplied invariants. It has 
not yet considered higher-level functional correctness properties, such as durable 
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linearizability and its variants. We believe our framework will be helpful in that 
regard. In a more theoretical direction, it would be interesting to understand 
how our compositional correctness theorem fits in general settings for abstract 
logical relations such as [16]. 
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Abstract. Hyperproperties specify the behavior of a system across mul- 
tiple executions, and are an important extension of regular temporal 
properties. So far, such properties have resisted comprehensive treat- 
ment by software model-checking approaches such as IC3/PDR, due to 
the need to find not only an inductive invariant but also a total alignment 
of different executions that facilitates simpler inductive invariants. 

We show how this treatment is achieved via a reduction from the ver- 
ification problem of V*3* hyperproperties to Constrained Horn Clauses 
(CHCs). Our starting point is a set of universally quantified formulas in 
first-order logic (modulo theories) that encode the verification of V*3* 
hyperproperties over infinite-state transition systems. The first-order en- 
coding uses uninterpreted predicates to capture the (1) witness function 
for existential quantification over traces, (2) alignment of executions, 
and (3) corresponding inductive invariant. Such an encoding was previ- 
ously proposed for k-safety properties. Unfortunately, finding a satisfying 
model for the resulting first-order formulas is beyond reach for modern 
first-order satisfiability solvers. Previous works tackled this obstacle by 
developing specialized solvers for the aforementioned first-order formu- 
las. In contrast, we show that the same problems can be encoded as 
CHCs and solved by existing CHC solvers. CHC solvers take advantage 
of the unique structure of CHC formulas and handle the combination of 
quantifiers with theories and uninterpreted predicates more efficiently. 
Our key technical contribution is a logical transformation of the afore- 
mentioned sets of first-order formulas to equi-satisfiable sets of CHCs. 
The transformation to CHCs is sound and complete, and applying it to 
the first-order formulas that encode verification of hyperproperties leads 
to a CHC encoding of these problems. We implemented the CHC en- 
coding in a prototype tool and show that, using existing CHC solvers 
for solving the CHCs, the approach already outperforms state-of-the-art 
tools for hyperproperty verification by orders of magnitude. 


1 Introduction 


Hyperproperties are properties that relate multiple execution traces, either 
taken from a single program or from multiple programs. Checking such properties 
is known as relational verification, and is essential when reasoning about security 
policies, program equivalence, concurrency protocols, etc. Existing specification 
languages for hyperproperties |14/6/43) extend standard ones, e.g., temporal logic 
or Hoare logic, with (explicit or implicit) quantification over traces. This shifts 
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the focus from properties of individual traces to properties of sets of traces. For 
example, k-safety [I5] is a class of hyperproperties, where k universal quantifiers 
are used to define a relational invariant over states originating from k traces. 


This paper addresses verification of hyperproperties with V*3* quantification 
over traces and a body of the form O¢ (where O stands for “globally”). This 
fragment captures many hypersafety (e.g., the aforementioned k-safety) and hy- 
perliveness properties, and was shown by |8| to express a wide class of properties 
of interest, including generalized non-interference (GNI) [88]. 


Verification of hyperproperties is more challenging than verification of single- 
trace properties, and, as a result, has gained a lot of attention in recent years. 
Unlike single-trace properties, verification of properties of k traces requires the 
discovery of relational inductive invariants, which define the relation between 
states of k execution traces. Since the construction of invariants that hold be- 
tween any k reachable states is hard (or even impossible, depending on the 
assertion logic), proving hyperproperties often hinges on finding an alignment of 
any k traces such that the invariant only needs to describe aligned states. 


In the case of k-safety properties, an alignment of traces is often given by a 
self composition [5]44] of the program, composing different copies of the program 
(or several different programs) together, e.g., by running the different copies in 
lockstep or by more sophisticated composition schemes, e.g., [24]. While self 
composition allows to reduce k-safety verification to standard safety verification, 
this reduction requires to choose the alignment of the different copies a-priori. 
The choice of alignment, however, has a significant effect on the complexity 
of the inductive invariants themselves, as demonstrated by [41]. This renders 
the standard reduction from k-safety verification to safety verification, based 
on a fixed alignment, impractical in many cases. As a result, finding a good 
alignment as part of relational verification has been a topic of interest in recent 
years [43]27/45]6]8}. 

In the case of hyperliveness properties that stem from the use of existen- 
tial quantification over traces (i.e. V*J* properties), complexity rises further. 
Verifying such hyperliveness properties calls for finding “witness” traces that 
match the universally quantified traces, in addition to the relational invariant 
and alignment. This reduces verification of V*4* properties to the problem of in- 
ferring three ingredients: (i) a witness function for existential quantification over 
traces, (ii) an alignment of traces, and (iii) a corresponding relational inductive 
invariant. These ingredients are all interdependent: different witnesses call for 
different alignments and give rise to different invariants, with different levels of 
complexity. It is therefore desirable to search for the combination of the three of 
them simultaneously, which is the focus of this paper. 


We propose a novel reduction from verification of hyperproperties with a 
y*J* quantification prefix over infinite-state transition systems to satisfiability 
of Constrained Horn Clauses (CHCs) [IJi0], also known as CHC-SAT. Impor- 
tantly, the reduction does not fix any of the aforementioned verification ingre- 
dients, in particular, the alignment, a-priori. Instead, it is based on a CHC 
encoding of their joint requirements. The unique structure of CHCs makes it 
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possible to adopt software model checking techniques (e.g. interpolation [89], 
IC3/PDR [32]35]) for solving them. Our reduction, thus, allows to use state- 
of-the-art CHC solvers [28]33[31/49] to achieve a highly efficient hyperproperty 
verification procedure. 

While it is known that safety verification can be reduced to CHC-SAT, we 
are the first to show how inferring the combination of a witness function, a trace 
alignment and an inductive invariant for hyperproperties of the V*i*-fragment 
can be reduced to CHC-SAT. 


The first step of our reduction to CHC-SAT is an encoding of the joint re- 
quirements of the witness-alignment-invariant ingredients as a set of universally 
quantified formulas in first-order logic (FOL) modulo theories, where uninter- 
preted predicates capture the witness, alignment and invariant, and first-order 
theories (e.g., arithmetic and arrays) are used for modeling the transition sys- 
tem and the requirements. Such an encoding has been proposed by [41] for the 
problem of finding an invariant together with an alignment in the context of 
verification of k-safety properties (the universally quantified subset of this frag- 
ment). We extend their FOL encoding to V*3* properties, based on the game 
semantics introduced in [8]. 


Unfortunately, the resulting FOL formulas are beyond what modern first- 
order satisfiability solvers can handle due to a combination of quantifiers with 
theories and uninterpreted predicates. In particular, the FOL formulas are not in 
the form of CHCs. As a result, previous works [4145] that used a similar encoding 
could not rely on a (single) CHC-SAT query to find the alignment and invariant 
simultaneously. Instead, resorted to an enumeration of potential alignments, 
using a separate CHC-SAT query to search for an inductive invariant (in a 
restricted language) for each candidate alignment. [45] developed a specialized 
solver that is able to handle these non-CHC formulas directly. 


In contrast to previous works, we introduce a second step where we transform 
the set of universally quantified FOL formulas to a set of universally quantified 
CHCs. This step—which is also the key technical contribution of the paper— 
allows us to use any CHC solver for hyperproperty verification, and benefit from 
current and future developments in this lively area of research. We emphasize 
that the transformation to CHCs is surprising since it allows us to overcome 
a seemingly unavoidable obstacle: a disjunction of atomic formulas involving 
unknown predicates, which arises from the encoding of a choice between different 
alignment and witness options. 


We implemented the reduction of V**-hyperproperty verification to CHC- 
SAT in a tool called HyHorn, on top of Z3 [23], using SPACER as a CHC 
solver. Our results show that HyHorn is very efficient in verifying V*3*-hyper- 
properties, outperforming the state-of-the-art |45]8/41] by orders of magnitude. 


Our main contributions are: 

— We develop a satisfiability-preserving transformation of first-order formulas 
of a certain form to CHCs. The transformation is accompanied by a bi- 
directional translation of solutions. 
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(1)Init(Vi) A Init(V2) A az > ai A b2 < bı > Inv 

(2)Inv(Vi, V2) A Aq} (Vi, V2) A Tr(Vi, Vi) A V2 = V3 > Inv 
(3)Inv(Vi, Va) A Aqa} (Vi, V2) A Vi = Vi A Tr(V2, V2) > Inv 
(4)Inv(Vi, V2) A Agia} (Vi, V2) A Tr(Vi, Vi) A Tr(V2, V3) => Inv(Vi, 


pre (aı < ag A^ bı > b2) 
squaresSum(int a, int b){ 
assume (0 < a < b); 
int c=0; 
while (a<b) {ct=a*a; a++;} 
return c; 
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(5)Inv(Vi, V2) A Agi} (Vi, V2) > ai < bı 
(6)Inv(Vi, V2) A Atay (Vi, Vo) — az < bz 


(7) Inv(Vi, V2) A Aq1,2} (V1, V2) > (a1 < bı Aaz < b2) 
V(a1 > bı Aaz > ba) 
(8)Inv(Vi, V2) > ((a1 > bı Aaz > b2) > cı > c2) 


} 
post (cı > c2) 


ai < az A bı > b2 > 
Yrı : (a < b), t2 :=(a < b)- 


(9)Ino(Vi, V2) > Aq (Vi, V2) V Aqa) (Vi, V2) V Agia) (Vi, V2) 


(b) 


(c > c2) 


(a) 


Fig. 1: (a) A program that computes the sum of squares of integer interval [a, b) 
with a 2-safety specification for it, and (b) its first-order encoding. 


— We apply the transformation to obtain, for the first time, a sound and com- 
plete reduction from verification of V*i*-OHyperLTL (w.r.t. a game seman- 
tics) to CHC-SAT. The reduction captures searching for an alignment, an 
4*-witness function and an inductive invariant simultaneously. It is applica- 
ble to infinite-state transition systems, with the caveat that their branching 
degree needs to be finite (bounded by a constant) if the hyperproperty in- 
cludes 4* quantification. 

To handle 3* in the presence of unbounded nondeterminism, we incorporate 
into the CHC encoding a sound abstraction based on a set of underapproxi- 
mations (“restrictions” ). 

We implement a tool, HyHorn, that constructs CHCs for V*d*-OHyperLTL 
specifications, and solves them using SPACER. In most cases, HyHorn discov- 
ers the solution completely automatically, while in some, it uses predicate 
abstraction, based on user-provided predicates. 


2 Overview 


We illustrate our approach for verifying hyperproperties by reduction to CHC- 
SAT. We start with the simpler case of k-safety properties, followed by the more 
general case of V*3* hyperproperties. 


2.1 Motivating Example 


As a means for highlighting the challenges in verifying hyperproperties, and, in 
particular, in reducing the problem to CHC solving, we present the example 
program squaresSum and its 2-safety specification from MI] in Fig. Given 
positive integers a < b, the program computes the sum of squares of all integers 
in the interval [a,b). squaresSum is monotone in the sense that as the input 
interval increases, so does the output c. Formally, this is a 2-safety property that 
requires that whenever two traces satisfy the pre-condition [a2,b2) C [a1, 61), 
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they also satisfy the post-condition cı > c2, where variable indices correspond 
to the traces that they represent. This is a special case of k-safety, where the 
relational property is checked at the end of the executions. More generally, we 
consider k-safety properties where the relational property is specified at desig- 
nated observation points (explained in Sec. [3). 


To verify the 2-safety property, a prominent approach is to reduce the prob- 
lem to a regular safety verification problem by composing the program with 
itself (known as “self composition”). There are (infinitely) many possibilities 
for aligning the traces in the composed system, and the alignment chosen has 
direct impact on the complexity of the inductive invariant needed to establish 
safety. For example, if the two traces of squaresSum are aligned in lockstep, 
then initially cı = co, after one step, cy < cg, and only later on, cı > c2. Show- 
ing that cı > c2 at the end requires tracking the difference cı — c2, which is 
a complex value because it involves the sum of squares itself. This cannot be 
captured by an inductive invariant in first-order logic using theories currently 
supported by automated solvers (e.g., linear arithmetic) and is therefore beyond 
reach for state-of-the-art solvers. On the other hand, if the second trace, whose 
input is the smaller interval, “waits” for a, and a2 to coincide before proceeding 
in lockstep, then the property that cı > cp becomes inductive (except for the 
first step), greatly simplifying the inductive invariant. It is therefore important 
to consider the alignment and the (relational) inductive invariant together. 


The requirements that the alignment and inductive invariant need to satisfy 
can be formulated in first-order logic [41]. To do so, we denote the program 
variables by V = (a,b,c). We express the initial states and program steps as 

Aa 


formulas over V (and primed variant V’) : Intt((V) =a>O0Ab>aAc=0, 


A 


TrV,V'!)=a<bAd =ct+a-:ahd =a+1Av' = b. To reason about two 
traces, we use two copies of V, denoted V; and V2. We introduce “unknown” 
predicates Inv, Agi}, Ago}, A{1,2} over (Vi, V2) to capture the inductive invariant 
and desired alignment of the traces. {A,,}. define an arbiter that, when A, is 
satisfied, schedules the steps of the traces according to u (for example, schedule 
u = {1} stands for a step in trace 1 and a stutter in trace 2). The arbiter 
therefore determines the alignment of the traces. The inductive invariant Inv 
relates states of the two copies of the program, making it relational. 


The problem of searching for the alignment and the inductive invariant si- 
multaneously is then posed as a satisfiability problem (modulo the theory of 
arithmetic) of the formulas in Fig. To ensure that the arbiter, which deter- 
mines the alignment, does not avoid violations of the post-condition by making 
one of the traces stutter forever s.t. it never reaches its final state, formulas 5-7 
require that the arbiter only schedule a trace if it has not exited the loop, unless 
both traces exited the loop (in which case both are scheduled). This “validity” 
requirement means that, at the latest, the arbiter must schedule a trace when 
the other reaches the final state. Formulas 1-4 then ensure that all states that 
are reachable, subject to the steps permitted by the arbiter, must satisfy Inv. 
Specifically, the first formula ensures the initiation condition of the inductive in- 
variant: the invariant satisfies the pre-condition and includes all the initial states 
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of the composed system. Formulas 2-4 ensure the consecution of the invariant 
under every choice the arbiter makes. The 8th formula ensures the safety of the 
invariant and the last formula mandates that there is always at least one choice 
that is enabled, and that the system never reaches a “stuck” state. 

An interpretation for the unknown predicates Inv, Açı}, Aya}, A{1,2} defines 
an arbiter and a corresponding inductive invariant. A possible solution is 


Aiy(Vi, Va) Ê ay < a2 V (b2 < a1 <b1) Aga} (Vi, V2) = L 
Ap, (V V2) = (a1 = az Aa < b2) Vay > bı 
Inv(V,, V2) = 0 <a, <b, A0 < az < b2 A ((a1 < a2 A cy È c2) V (a1 È a2 Aci > €2)) 


This solution captures the arbiter that makes the second trace wait until 
a, = az, then makes both traces proceed together until the second one exits its 
loop, in which case the first trace continues to execute alone until it also exits 
its loop and both traces are again (vacuously) scheduled together. The solution 
to Inv captures the corresponding inductive invariant previously discussed. 


2.2 Challenges in Encoding Hyperproperty Verification as 
CHC-SAT 


The formulas of Fig. with the exception of the last one, are constrained 
Horn clauses. That is, when the implications in these formulas are converted to 
disjunctions, at most one predicate application appears positively in each clause. 

Alas, the presence of the last formula precludes direct application of exist- 
ing CHC solvers. The problem is the disjunction on the right hand side of the 
implication. Such a disjunction appears to be crucial for a correct encoding of 
the problem. The reason is that uninterpreted predicates designate semantic re- 
lations. With such predicates denoting the choice of schedule, it is easy to drop 
into a vacuous solution where some states have no corresponding choice and are 
essentially “stuck”, unsoundly making a post-condition violation unreachable. 
Encoding the requirement that every state have a schedule results in a clause 
with multiple occurrences of positive literals, capturing inherent disjunctions 
over the possible choices, which are not Horn. In particular, these disjunctions 
cannot be eliminated by renaming [37]. 

Previous works tackled this obstacle either by employing explicit enumeration 
of alignments that satisfy the non-Horn clause to avoid the disjunction [41], or by 
developing specialized techniques that are able to handle such disjunctions [45]. 


2.3 Our Approach: Transformation to CHC 


In this paper, we show that the problem of searching for an alignment together 
with a (relational) inductive invariant can be encoded using CHCs, allowing us 
to reduce the problem to CHC-SAT, without fixing the alignment a priori. 

A key insight of our reduction to CHC-SAT is the use of “doomed” states 
as a way to avoid the problematic disjunction over all choices of schedules. We 
refer to a given state as “doomed” if it necessarily reaches a state that violates 
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Dy(Vi,V2) A Diay(V1, V2) A Dei 2\(Vi,V2) A Init(Vi) A Init(V2) A az > a1 A b2 < bı > L 
(a1 > bı A ag > b2 > cı > C2) > Dory (Vi, V2 
(a1 > bi A ag > b2 > cı > c2) > Di2} (Mi, V2 
(a1 > bı A az > b2 > cı > c2) > Dero} (V1, V2) 
(ar < b1) > Day (Vi, V2 
~(az < bz) > Dia} (Vi, V2 
~(aı < bı Aaz < b2) A\a(a1 > bı A a2 > b2) > Dg1,2} (V1, V2) 
Diay (Vi, V2) A Dia ( Vi, V2) A Du,23(Vi, V2) A Tr, Vi) A V2 = V2 > Du (Vi, V2 
Diay (Vi, V2) A Dia( Vi, V2) A Dy, Vi, V2) A Vi = Vi A Tr(V2, Vo) > Dio (Vi, V2 
Doj(Vi, V2) A Dyay(V1, V2) A Da2y(Vi, V2) A Tr(Vi, Vi) A Tr(V2, V2) > D{1,2} (V1, V2) 


Fig. 2: CHC encoding of Fig. 


the hyperproperty along every valid alignment (as opposed to some in the di- 
rect encoding). Importantly, due to this conjunctive nature, doomed states lend 
themselves to a Horn encoding. If an initial state is identified as doomed (i.e., 
the CHCs are unsatisfiable), then the property is violated and a counterexample 
can be retrieved. Otherwise, if the set of initial states does not intersect the set 
of doomed states, then the hyperproperty is proved. Moreover, given an inter- 
pretation of the unknown predicates in which the initial states are not doomed, 
an alignment and a corresponding inductive invariant can be retrieved. 

Based on this insight, in Sec.|4| we develop a general transformation of formu- 
las of a certain form, to an equi-satisfiable set of CHCs. Furthermore, we provide 
a transformation of solutions between the two formulations (in both directions). 
The first-order formulas to which the transformation is applicable follow the 
overall structure of the formulas in Fig. but are somewhat more general. 
For example, some of the unknown predicates may have additional arguments, 
which turn out to be useful when considering a broader class of hyperproperties 
beyond k-safety (V*3*). 

In Sec. [5] we apply the transformation of Sec. [A] to reduce k-safety verification 
to CHC-SAT. When applying the transformation on the formulas encoding our 
running example (Fig. {Ib}, we obtain the set of CHCs depicted in Fig. [2] over 
unknown predicates D{1}, Dra}, D{1,2}- 

In the CHCs of Fig. |2| an unknown predicate D,, represents states that are 
“doomed” if schedule u is chosen. The first CHC requires that no initial state 
that satisfies the pre-condition is completely doomed, i.e., for every such state 
there is a schedule for which it is not doomed. The remaining CHCs encode the 
properties of doomed states for each schedule. For example, the CHCs where 
D4, is in the head (right hand side of the implication) imply that a state is 
doomed for schedule {1} if: (a) it violates the post-condition, (b) it already 
exited the loop and hence trace 1 cannot be the only trace to be scheduled, or 
(c) it is the pre-state of a transition taken by 1 leading to a post-state that is 
doomed for every choice u. 

A solution to the CHCs in Fig. [2] can be obtained from the solution to the 
formulas in Fig. [1b] by Da =-=(UInv ^ Au) for every u € {{1}, {2}, {1, 2}}. 

More generally, in Sec.[4] we show a bi-directional transformation of solutions. 
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2.4 Beyond k-Safety 


Our transformation to CHCs is not limited to an encoding of k-safety, but also 
generalizes to hyperproperties that use V*4* quantification over traces, as pre- 
sented in Sec. [6] 

Hyperproperties with existential trace quantification become meaningful in 
the presence of nondeterminism in the program. For an example of such a 
property, consider a nondeterministic variant of squaresSum where the assign- 
ment c += a * a is replaced by if (*) c += a * a. That is, the increment 
of c may nondeterministically be skipped. We may now wish to verify that, if 
[a2, b2) = [a1, b1), then for every trace from input [a1, b1) there exists a trace from 
input [a2, b2) such that when both terminate, c1 4 c2. This is a Va-hyperproperty. 

To verify such properties, a “witness” function is needed to map the univer- 
sally quantified traces to the corresponding existentially quantified traces such 
that the body of the formula holds for the combination of the traces. Even if a 
witness function is known, to verify that the combination of the traces satisfies 
the body of the formula, we still need to find a proper alignment of the traces 
and an inductive invariant. As in the case of k-safety, these components are all 
interdependent, making it desirable to search for all of them together. 

In general, the witness function for the existentially quantified traces may 
need to depend on the full universally quantified traces. However, [8] defines a 
sound but incomplete game semantics, in which the witness function essentially 
constructs the existentially quantified traces step-by-step, in response to moves 
of a “falsifier” who reveals the universally quantified traces step-by-step. 

We show in Sec. [6.1] that the problem of searching for a step-by-step witness 
function, an alignment and a (relational) inductive invariant can be encoded in 
first-order logic, and the encoding is amenable to our transformation to CHCs. 
This results in a sound and complete CHC encoding of the game semantics of 
for transition systems whose branching degree is bounded by a constant, which 
we henceforth refer to as “finite branching”. 

The idea in the V*3*-first-order encoding is to let the unknown predicates 
A,, specify not only the schedules chosen by the arbiter but also the choice of 
existentially quantified traces for the witness function. To do so, we assign a 
unique label to each of the possible transitions, and use these labels to identify 
the transitions along the traces. In this encoding, instead of u denoting a schedule 
only, it now denotes both a schedule and a choice of labels identifying the next 
transitions in the existentially quantified traces according to the witness function. 
Furthermore, the A,, predicates receive additional arguments that represent the 
next labels along the universally quantified traces. 

For example, in the nondeterministic variant of squaresSum, there are at 
most two possible transitions in each control location. We therefore introduce 
two labels to distinguish between these possibilities: i for “increment” and s for 
“skip”. The predicates that describe the schedules and the choices of existentially 
quantified traces for the Va-hyperproperty of interest are: 


Atiza Atoi A12}, Ata} 8 Apa} s Af 2}, 8° 
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They are defined over (V1, V2,a), where a ranges over the possible labels. 

Note that in this encoding, the A, predicates are no longer defined over 
(Vi, V2) only, but have additional arguments for the labels of the universally 
quantified traces, while Inv does not. Thus, the reduction to CHCs applies our 
transformation in a more general setting than Fig. Furthermore, since u 
denotes both a schedule and a choice of labels for the existentially quantified 
traces, the number of A,, predicates depends on the number of labels. To ensure 
that there are finitely many predicates, we require the transition system to have a 
finite branching degree (otherwise, the space of possible labels becomes infinite). 

Finally, in Sec. [6.2] we extend our approach to handle infinite (or unbounded) 
branching in the transition system, which can result, for example, from reading 
an input from an infinite domain. To do so, we introduce another first-order 
encoding that roughly replaces the infinitely-many concrete choices of transi- 
tions by finitely-many abstract choices. Unlike the cases of k-safety and V*3*- 
hyperproperties with finite branching, the resulting encoding is sound but in- 
complete w.r.t. the game semantics. By applying our transformation, we obtain 
a sound (albeit incomplete) reduction to CHC-SAT. 


3 Background 


We use first-order logic to model systems and their properties. Throughout the 
paper, we fix a background first-order theory 7 and denote its signature by X. 


Transition Systems A (symbolic, labeled) transition system is a tuple TS = 
(V, a, Init, Tr), where V is a vocabulary, i.e., a vector of (logical) variables, each 
associated with a sort from 3’, denoting state variables; a is a label variable; Init 
is a formula over X with free variables V, and Tr is a formula over X with free 
variables V U {a} UV’, where V’ consists of the primed variants of V. 

A state of TS is a valuation to V, and we denote by S the set of all such 
valuations; L is the set of values that a can take, called labels; So C S is the set of 
initial states, which consists of all valuations that satisfy Init, and R CSxLxS 
is the transition relation, which consists of the valuations for the composite 
vocabulary V U {a} UV’ that satisfy Tr. For simplicity, we assume that R is 
total, i.e., Vs E S X EL, 8’ €S-(s, 0,8’) € We say that TS is deterministic 
when Vs ES, LEL. {s | R(s, a, s')}| = 1 and that it has finite branching when 
L is finite. A trace of TS is an infinite sequence of states t = sọ, s1, ... such that 
for every i > 0 there exists Z € L such that (s;, l, si+1) E€ R. We denote by tfi] 
the th state in t. We further denote the set of traces that start from a state s 
by T(s), and the set of all traces of TS by T. 


Hyperproperties and their specification. We consider a fragment of the relational 
logic OHyperLTL [6] , which we call V*3*-OHyperLTL with formulas of the form: 
p= Y > Yr : &1,.--,70: & dma: €441,---, Tk : Ek- Od 


3 w.l.g.; Tr can always be replaced by TrV((Va YV’-—=Tr)AV’ = V), which corresponds 
to adding self loops to states that have no outgoing transition. 
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where m; are trace variables whose intended valuations are taken from T; é; 
are (non-temporal) formulas with free variables V that determine observation 
points along the k traces, where the traces must synchronize; % is a pre-condition 
that is assumed to hold initially; and ¢@ needs to globally hold when all traces 
reach the observation points (which they must synchronize on before moving 
on). V; denotes a copy of V where all variables are indexed by j. We refer to the 
variables in V; as the state variables of the j’th trace (namely, mj). When l = k, 
i.e., all quantifiers are universal, y is a k-safety property. A relational pre/post 
specification, as used in our motivating example, is a special case of a k-safety 
property where the observable points are the final states (which are augmented 
with self loops). For example, Fig. [la|presents the V*4*-OHyperLTL specification 
of the motivating example. When | < k, the formula also includes existential 
quantifiers, extending expressiveness to include some hyperliveness properties. 
An example of a security hyperliveness property that can be expressed in V**- 
OHyperLTL is generalized non-interference (GNI) [38]. GNI requires that for 
any two traces 7 and 7 there exists a trace 73 whose high (secret) inputs agree 
with mı and whose low (public) inputs and outputs agree with 7. 


V*s*-OHyperLTL formulas are interpreted over transition systems. Intu- 
itively, y holds in a transition system if from every k initial states that jointly 
satisfy the pre-condition w, for every l traces from the first l states there exist 
corresponding k—[ traces from the remaining k—I states s.t. the composed states 
of all traces globally satisfy ¢, when the traces are projected to their observation 
points. Formally, given a transition system TS and ọ as above, we refer to a tu- 
ple (s1,..., 8%) of k states of TS as a composed state. A composed state defines 
a valuation to Vj U... U Vk, where s; is the valuation of Vj. A composed state 
is initial if s; € So for every 1 < i < k. We say that TS | ọ if for every initial 
composed state 3 = (s1,...,8,) such that 5  w the following holds: for every 
ty,...,t; E€ T(s1) x +--+ x T(s;) there exist ti41,...,t~ E€ T(si41) X +++ x T(sp) 
such that (ti)e,,.--, (tee, = Od, where (t:e; is the projection (filtering) of 
trace t; to states satisfying €;. The semantics of O¢ is that t),...,t, H= Od iff 
Vi < min(|t4|,...,],))- 4 [d,---,t, ld) H @. Note that the semantics is oblivious 
to the transition labels since labels are only implicit in traces. Labels become 
useful in Sec. [6] where we use them to identify transitions along traces. 


Remark 1. To simplify the presentation we consider hyperproperties defined 
w.r.t. a single transition system. The extension to multiple transition systems is 
straightforward. Similarly, Oø can be generalized to any temporal safety prop- 
erty via the standard automata-theoretic approach to model checking. 


Constrained Horn Clauses (CHCs) are defined over a signature X’ that extends 
X with a set P of (uninterpreted) predicates. Symbols in X are called interpreted, 
while the predicates in P are uninterpreted (sometimes called unknown). First- 
order formulas over X are called constraints. A CHC is a first-order formula 
of the form VV - N; Pi(Xi) A y(X) + H(Xy) where X is a vector of (logical) 
variables; P; € P (not necessarily distinct, i.e., it is possible that P;, = P;, for 
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Inv(V) a) iS ADul¥,W) A av) > L 
f A cities ucU 
Inv(V) A Au(V,W) A Wu (V, W) > L i P He 
Inv(V) A Au(V,W) A du(V, V', W) > Inv(V’) a Yul , ) > Dal 
Inv(v) > V AV, Ww) a NOY W) > Da(V, W) 
ucU 
(VJ = wu € U) (a) (b) 


Fig. 3: Formula scheme before (a) and after (b) the transformation. 


i; Æ i2); H is either L or a predicate from P; Xi, Vy C X; and y is a constraint. 
The universal quantification over ¥ is often omitted. 

A set of CHCs (or, more generally, first-order formulas) is satisfiable (modulo 
T) if it has satisfying model M such that the projection of M onto X is a model 
of T. A solution to a set of CHCs maps every predicate in P to a formula over 
X that defines it such that substituting all occurrences of the predicates by their 
definitions results in formulas that are valid modulo 7. If a set of CHCs has a 
solution then it is satisfiable. However, the converse may not hold due to the 
limited expressive power of first-order formulas. 


4 General Transformation to CHCs 


In this section we describe a satsifiability-preserving transformation that lets 
us convert a set of formulas, which adheres to a specific FOL scheme, to an 
equi-satisfiable set of CHCs. An extended version, with step-by-step details of 
the transformation, appears in [34]. Later we show how verification of a V*i*- 
OHyperLTL property can be captured by a set of formulas of the aforementioned 
scheme, where this transformation allows us to then reason about the correctness 
of the V*4*-OHyperLTL property by deciding the satisfiability of the CHCs. 
Consider the scheme in Fig. for a set of formulas over a signature X’ 
that extends the signature X of the background theory by unknown predicates 
Inv and {Au}uevu, for some finite set U. V,V’,W denote disjoint vocabularies, 
i.e., vectors of (logical) variables that are implicitly universally quantified. A row 
prefixed by |V | indicates |U | formulas, where u is substituted by all corresponding 
values from U. a, B, Yu, Ôu designate constraints (no occurrence of Inv or Au). 
At a high level, formulas 1 and 4 in Fig. [3a] use Inv to capture an inductive 
invariant of the “states” (valuations to V) reachable from a by “transitions” of 
du, restricted according to a choice u € U of an “arbiter” {Au}u. Formula 2 
establishes the fact that the reachable states are disjoint from some “bad states” 
8. Formulas 3 allow to enforce that the arbiter meets certain requirements, and 
formula 5 ensures that the arbiter makes a choice for every “state” in Inv. 


Example 1. For our running example, we have V = (Vi, V2) = (a1, b1, C1, a2, b2, C2), 
VW = (VI, V3) = (a4, 04, c1, a4, bb, ch), and W = () (The extra vocabulary W will 
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come into use later in the paper). U is the set of arbitration choices {{1}, {2}, {1,2}}, 
and the corresponding completion of the constraint holes a, 8, Yu, Ôu is easily dis- 
cernible. (Note that a constraint on the right of — corresponds to its negation 
on the left.) 


Note that the last formula in Fig. Ba] is not a CHC since its head is a dis- 
junction of unknown predicates. To remedy this shortcoming, we transform the 
formulas in Fig. [8a] into the set of CHCs in Fig. The CHCs obtained for 
the running example are included in the extended version of the paper [34]. The 
transformation ensures: 


Theorem 1. The set of formulas in Fig. is equi-satisfiable to the system of 
CHCs in Fig. Furthermore, there is an efficient translation of models of the 
former to models of the latter, and vice versa. 


Proof. The extended version of the paper includes a stepwise transforma- 
tion that shows how the CHCs in Fig. are obtained from the formulas in 
Fig. where each step preserves equi-satisfiability and models. Here, due to 
space constraints, we only describe the final translation between models, which 
we have verified with Z3: 


Given Inv, Au H [Fig. Bal | Given Du — [Fig. Bb] 
Du(V,W) = -Unv(V) A Au(V,W)) | Inv(V) = YW- Vey “Du (VY, W) 


5 Encoding k-Safety Verification as CHCs 


In this section we address the problem of verifying k-safety properties via a 
CHC encoding. To this end, we start with a natural, non-Horn, encoding of the 
problem, as described in the previous section and previous works MISIS], and 
apply our transformation to obtain an equi-satisfiable system of CHCs. 


Consider the k-safty formula: gy = Y > Vay: &1,.--,7,:& + Od 


This formula holds in a transition system TS if, starting from initial composed 
states that satisfy the pre-condition Y, the observable states along every tuple of 
k traces satisfy ¢, when the observable states are reached synchronously. Note 
that a pre-post specification, as used in our motivating example, is a special 
case of such a formula where the observable states are the final states. Verifying 
y corresponds to finding (1) an alignment of the traces that synchronizes the 
observation points defined by &,...,&, and (2) an inductive invariant that es- 
tablishes that ¢ holds whenever €1,...,&€ hold. Note that the invariant needs to 
be inductive along the aligned traces, including intermediate states between ob- 
servable points. As different alignments give rise to different inductive invariants, 
it is desirable to find both of them simultaneously [41]. 

As before, we model the alignment using an arbiter that schedules a subset 
Ø+ M C {1,...,k} of the traces to make a step based on the current composed 
state s,---s,. The arbiter may be nondeterministic, but it must choose at least 
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N: Init(Vi) YV) > Inv(V) N Du) A N nit(Vi) Ad(VY) >L 
Inv(V) A Bad(V) > L M i 
Y| Inv(V) A Am(V) A avalide (V) > L v Bad(V) > Dm (V) 
[vY]  Inv(V) A Am(V)A ôm (V, V’) > Inv(V’) v ~validm (V) > Du (V) 
Inv(V) +\f Au(v) |W J Dur V’) A ôm, V’) + Du(V) 
ee (a) (b) 


Fig. 4: k-safety formula scheme before (a) and after (b) the transformation. 


one set M. Furthermore, the arbiter must respect the synchronization of the 
observation points: it must not let a trace proceed beyond its observation point 
before the other traces reached theirs. This motivates the following definition. 


Definition 1 (valid schedules). M is a valid schedule for a composed state 
81:°+ Sk Uf either of the following two conditions holds: 
1. Vie M- si K€; 2A Vi € M - si F £i and M = {1,..., k}. 


Intuitively, the observation points act as a “barrier”. All traces must reach 
the observation point before any of them can progress past it; and when they 
do, they do it simultaneously[}] 

To reason about composed states, we define a vocabulary V = Vi U...U Vk 
that consists of the set of state variables of all traces. We encode the arbiter 
using a family of unknown predicates {Am (V)}m for every Ø 4 M C {1,...,k} 
and the inductive invariant using an unknown predicate Inu(V). We express the 
situation where all traces reach an observable state but ¢ does not hold using 
the constraint: Bad(V) = AN; €:(V;) A -@(V). The joint steps of the traces as 
determined by the schedule M are given by the following constraint: 


A 


Am(V,V',a1,---54%) = Niem TVi ai, Vi) A \igem Vi =V] 
ôm (V, V’) = day, +++ ,Qk°* Am(V,V',a1, E , ak) 


Note that the label variables are existentially quantified?| indicating that any 
labeled transition can be used. The definition of a valid schedule is captured by: 


validu (V) = la 7& (Vi) M #{i,...,k} 
(Mee "GV V (Aem GO) M = {1,...,k} 


Fig. [4a] formalizes the joint requirements of the arbiter and the inductive in- 
variant that ensures that p holds. The following theorem summarizes the sound- 
ness of the encoding, which is a slight generalization of the encoding in [41] 
(where only pre/post specifications are considered): 


(1) 


4 The requirement that all traces leave the observation point in tandem saves the need 
to record which of them already made a step since the last observation point. 

5 Since ôm appears on the left-hand side of an implication, existential quantifiers can 
be pushed outside as universal quantifiers, resulting in quantifier-free bodies. 
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1 eum m O; 10 else { 
a 11 i= 1; 
So ah BRR 0 4 12 while (i < n) { 
4 i = 0; 13 yok 
3 white (i <m> 1) t 14 sum = sum + A[i] + y; 
6 sum = sum + A[il]; i 

R 15 yi oie ey 
7 itt; 

16 } 

eo 17 } 
9 +} 


(Ay = A2 Ani = n2) > Vm: pe = 5 dmg: pe = 5 V pe = 12-O(b2 < OA sum = sume) 


Fig. 5: Example for a V3 hyperproperty. 


Theorem 2. The set of formulas in Fig. Hal is satisfiable iff TS = ọ. 


Example 2. Applying the scheme of Fig. Maļto the program and Y*3*-OHyperLTL 
specification of the 2-safety property from Fig. [Ia]results in Fig. except for 
moving constraints to the right hand side of the implication when it assists 
readability. Note that in this example, the observation points é; of both traces 
correspond to the condition for exiting the loop (which is the negated loop con- 
dition). As a result valid, = a; < bi for i € {1,2} and valid 49} 2 (a < 
bi Aao < b2) V (~(aı < bı) TAN —(az2 < b2)). 


The set of formulas in Fig. Ma] fits the general scheme of Fig. Thus, it 
is amenable to our general satisfiability-preserving transformation, the CHCs 
in Fig. [4{b). Since the transformation is satisfiability preserving, we obtain the 
following as a corollary of Thm. [and 


Corollary 1. The system of CHCs in Fig. is satisfiable iff TS = y. 


Where Am(V) in Fig. describes the states where choosing schedule M 
leads to successful verification with Inv as an inductive invariant, Dm(V) in 
Fig. [4b] can be understood as describing states where choosing M would prevent 
the verification from going through in the sense that no inductive invariant 
would exist. In other words, these states are “doomed” if M is chosen, hence 
the choice of notation. If the set of CHCs in Fig. [Ab] is satisfiable, it proves that 
initial states that satisfy the pre-condition are not doomed. This intuition can 
be interpreted in a dual manner: if the initial states are not doomed, then there 
exists an alignment for which a safe inductive invariant exist. 


6 Encoding V*i* Hyperproperties as CHCs 


In this section we consider the more general case of V*3*-OHyperLTL specifi- 
cations. Throughout the section, T'S is a transition system, and we fix a formula: 


p= Y > Yr : &1,.-.,70: & dma. : &141,-.--, Te Ek- Od 

In order to encode the problem of deciding if TS |} ọ as a satisfiability 

problem, we follow [8], and consider a game semantics, which is natural due 

to the alternation of quantifiers. The Y and 4 quantifiers are “demonic” and 
“angelic”, thus controlled by the falsifier and the verifier, respecitvely. 
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In the following, we introduce the game semantics of [8] for V*3*-OHyperLTL. 
We then encode truth of y in TS under the game semantics as a satisfiability 
problem, and use the transformation from Sec. [4] to obtain a system of CHCs 
that is satisfiable iff TS satisfies p according to the game semantics. 


Example 3. To illustrate the game semantics, we use the example in Fig.|5} which 
accompanies this section. The presented program computes the sum of an array 
slice, nondeterministically choosing between the slice A[0..n — 2] and A[1..n — 1]. 
For the second variant, an arbitrary integer can be added to each summand. 
This allows the program to fulfill the specification at the bottom, which requires 
that for every execution there is a corresponding execution of the second variant 
(where b2 < 0) such that the sums at lines 5 and 12 align at every iteration. The 
specification is valid because y at line 13 can always be chosen to compensate 
for the deviation due to the index 7 not being the same. 

Considering the game semantics, the falsifier first has to choose a value for 
b, which can be either positive or nonpositive. If it is nonpositive, then the 
verifier wins the game vacuously because €; = (pc = 5) is never reached. If the 
choice is positive, then the verifier must choose nonpositive to satisfy bg < 0 
from the specification. In subsequent steps, the verifier must select a scheduling 
that will align pc; = 5 and pc = 12 at every iteration, and select a value for 
y such that after both assignments (lines 6 and 14) sum, = sumg is satisfied. 
When following these choices, the verifier manages to satisfy sum, = sumo at 
all observation points, which gives it a winning play. 


Safety games are played between a verifier, whose goal is to avoid bad states, 
and a falsifier who tries to reach a bad state. Formally, the game is a tuple 
G = (VS, FS, So, ôv, ôr, B) where VS are verifier states, in which the verifier 
moves, and FS are falsifier states, in which the falsifier moves, and VSN FS = Ø. 
The game states are S = VS U FS. So C S is a set of initial states, and BC S 
is a set of bad states. dy C VS x S' defines the possible moves of the verifier and 
ôr C FS x S—of the falsifier. It is assumed that dy, dr are total i.e., there is at 
least one move for each player from every state. A play is a sequence of game 
states 09,01,... such that og € So, and for every i > 0, (Ti, ci+1) E€ dv U Op. 
The play is winning for the verifier if it is infinite and c; ¢ B for every i > 0. 
A (memoryless) strategy for the verifier is a function x : VS —> S such that 
(o, x(c)) € oy for every o € VS. x is a winning strategy for the verifier if all the 
plays in which the verifier moves according to x are winning for the verifier. 


Game semantics for V*i*-OHyperLTL Let p be as above. The game that cap- 
tures the semantics of y is defined with respect to a deterministic labeled transi- 
tion system TS = (V,a, Init, Tr). (We can always determinize TS by extending 
the set of labels without affecting the semantics; this step may introduce in- 
finitely many labels, which do not require any special treatment in the definition 
of the game, but whose CHC encoding will be addressed in Sec. (6.2}) 

The game for y and TS proceeds in rounds, where in each round the falsifier 
makes a move and the verifier responds. The falsifier states are composed states 
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(of k traces), and the verifier states augment them with a record of the falsifier’s 
last move. The bad states are falsifier states where all traces are in their obser- 
vation points but ¢ does not hold. The falsifier is responsible for choosing the 
transitions that define the V traces tı. assigned to 71.1. The verifier responds by 
choosing the transitions of the J traces t;+1,., assigned to 741.... Here the labels 
of the transitions come into play: the players specify the transitions of choice by 
picking a label @ € L for each trace. (Since TS is deterministic, transitions are 
uniquely identified by labels.) The traces then need to be aligned s.t. they syn- 
chronize on their observation points defined by €;. The alignment does not affect 
the winner of the play, as long as it is a valid alignment. However, as in the case 
of k-safety, the alignment is instrumental for obtaining a winning strategy that 
has a simple description. As a result, the choice of the (valid) alignment is also 
left to the verifier. Altogether, a move of the falsifier consists of picking labels 
fi,...,€, € L for the V trace variables; a move of the verifier consists of picking a 
valid subset Ø # M C {1,...,k} of the traces to progress (as in Sec. |5) and also 
labels €j41,...,€% E€ L for the J trace variables, and proceeding to the resulting 
composed state[>| In this manner, the verifier iteratively “reads off” the states 
of tı., properly aligned, and generates the traces tj41..., while avoiding the bad 
states. If the verifier can do so indefinitely, then this proves that y holds. 

Formally, the components of the game are as follows (here, M represents a 
valid schedule according to Definition [1}: 


FS =S* VS =S* x L! So = {5 € SE | 5 H vy} 
B ={se FS |5 j ¢ and s; H £; for every 1 < i < k} 
ôr ={(8,(3,0")) |3 € FS, MEL} by = {(E, 0), 3) |3 SF for  € LE} 


The notation 5 ~ 3’ indicates that 3' is obtained from 3 by taking the tran- 
sition with label 4; from s; whenever i € M, and stuttering otherwise, where 
0 = (t,,...,€). We refer to it as a transition of the composed system accord- 
ing to schedule M labeled £. The labels are split into © = (4, .., 4) and E = 


_M2_ 
(€141,+-,€%). Formally, 58 == Niem R(si, li 55) A Nig Si = Si 


Example 4. In the example of Fig.|5| the labels of transitions are integer values 
that reflect the choice of * at lines 2 and 13 (and have no effect on other states). 
The verifier and falsifier specify their moves using these labels. For example, in 
order to ensure that sum, = sumg is satisfied at every iteration, the verifier 
selects a transition label = A[i — 1] — Ali] in line 13, which sets the value of y 
accordingly; after both assignments at lines 6 and 14, sum, = sum holds. 


The game semantics of V*4*-OHyperLTL is based on the winner in the veri- 
fication game: 


Definition 2 (Game Semantics for V*i*-OHyperLTL [8]). Let TS be a 
transition system and p a V*3*-OHyperLTL formula. TS satisfies y according 


6 In [8], steps of the verifier are split to two. Our definition is more precise in the sense 
that a winning strategy in the game of [8] implies a winning strategy in our game. 
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to the game semantics, denoted TS |g y, if the verifier has a winning strategy 
in the verification game Grsy- 


Theorem 3 (as shown in {8]). If TS Kg y then TS E ọ. 


6.1 CHC Encoding of the Game with Finite Branching 


To encode the verification game for y and TS, we introduce unknown predi- 
cates {A,}ucy that describe the strategy of the verifier as well as an unknown 
predicate Inv that encodes an inductive invariant that ensures that the strat- 
egy is winning. We first consider the case where the set of labels L is finite, 
i.e., TS has a finite branching. This makes it possible to define U as the set 
of all possible concrete choices of the verifier and introduce a predicate Ay per 
every possible choice of the verifier. To do so, we define U = M x L*~!, where 
M = P({1,...,k})\{@} is the set of possible schedules, and L*~! are the choice 
labels for constructing the traces assigned to {7;};-1+41.... Note that U is finite 
in this case. For each u = (M, £?) € U, the predicate A,, describes the verifier 
states in which the verifier chooses u for its move. Recall that verifier states 
consist of both the previous state of the verifier, captured by the composed state 
vocabulary V defined as before, and the last move of the falsifier, captured by 
label variables (a1,...,a,). We denote LY = (a1, ..., a1), L? = (ayy1,..., ak) and 
L = LYU L? = (ay,..., ax). Then, the A, predicates are defined over V U LY. 
The Inv predicate is defined over V only, as it describes a set of falsifier states. 
The formulas in Fig. [6a] formalize the requirements that ensure that {Au}. 
defines a winning strategy for the verifier, while accounting for the alternating 
choices of the falsifier (€”) and verifier ((M, @4)) in every round, where 
Am(Y, VL) = Niem Tr(Vi, Qi, Vi) ^ Nigm V; = Vv; 
Sy gal(V, VL) = Au(V,V',£)[L34 F] Bad(V) = MN; &(Vi) A70(V) 
Am is the formula expression of gel from above. That is, 3,8’, 2 (valuations 
to V,V’,£) satisfy Aj, if the composed system according to M has a transition 
from 3 to 3’ labeled £. ô MI is then the projection of Am to a concrete choice 
of labels Æ for the existentially quantified traces; the labels for the universals, 
captured by £“, remain free. 


Theorem 4. The set of formulas in Fig. [6a] is satisfiable iff TS |g y. 


Proof. A solution for Fig. induces a winning strategy x for the verifier in 
the game for y and TS: y(s,0%) = 38’ for 5 = Inv, where 3’ is reached by 
choosing (M, @) (i.e., 3,8," = ôg) such that 5,£% H Aj, 73; such 3’ must 
exist because the last formula states that there must always be a choice for 
the verifier in falsifier states that satisfy Inv. For 3 |K Inv, x(3,£) is defined 
arbitrarily. In the other direction, given a winning strategy for the verifier, we 
define the interpretation of Inv to be its winning region and the interpretation 
of Amza to consist of the falsifier states (s,£”) where the strategy chooses 3’ 


=z NL 
such that 5,8 = Amz 
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Aj mit Vi) A Y 
Inv(V) A Bad 


( 
V Inv(V) A Ay 7a; LY) A svalid ur( 
Y| Inv(V) A Aya, L) Amal, V, L) > Inv(v) 


Inv(V) > V Amz V, L) 


V) > In(V) 
VoL 
V) > 


(M,@3)eU 
(a) 

A Dy ia(V) A AN Init(Vi) A yV) > 
(M,03)EU 
Vv Bad(V) > Duz (V) 
Vv avalid m (V) — Duma (V) 
id A Dum 73 VA Ô m 73 (V, V’) > Dual) 

(M’,0’3)eu 

(b) 


Fig. 6: A game formula scheme before (a) and after (b) the transformation, where 
VV] = VM, 7) €VU. 


Remark 2. For k-safety properties, the encoding in Fig. based on the game 
semantics, is equivalent to the encoding in Fig. |4a| (Sec. |5 eis particular, in this 
case, the set £7? is empty, which means that (7 = (), ae in a game with 
finite branching, namely only the choices of the schedule M. Note that for such 
properties, the benefits of the game semantics are less obvious since if TS E y, 
then every strategy is winning for the verifier. 


Encoding safety games in general The game encoding in Fig. [6a] and Thm. 
are stated here for the specific safety games corresponding to V*4*-OHyperLTL 
verification in order to avoid additional notational burden. However, the result is 
applicable to a more general class of safety games where the moves of the players 
are organized in rounds, each of which comprises of a move of the falsifier, 
followed by a move of the verifier. Furthermore, the states of the verifier are 
“intermediate states” defined as VS = FS x N, where (2 is a set of auxiliary 
states used to record the last falsifier move. The initial and bad states are falsifier 
states. The verifier moves to a new state according to the previous state together 
with the auxiliary state, while the falsifier is only allowed to choose the auxiliary 
part of the state. Therefore, dr C {(8, (8,w)) | § € FS,w € R}. The encoding 
extends to such games, where Init(V;) A w(V) is replaced by an encoding of So; 
Bad is replaced by an encoding of B; ôy za (V, V’, £Y) is replaced by an encoding 
of dy o ôp as a formula where the falsifier state variables and the choices of the 
falsifier are free, and validm(V) is replaced by a guard encoded over the same 
free variables that ensures that the verifier step is applicable. Accordingly, our 
subsequent results (including the CHC encoding) extend to any such game. 
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Applying our transformation to the formulas in Fig. [6a] results in the CHCs 
in Fig. Intuitively, A mgs describe the winning strategy for the verifier: for 
“safe” states, represented by Inv, and given a move made by the falsifier, if the 
verifier chooses to move according to (M, £}, then it stays in the “safe” region. 
In contrast, Dj, 7a represents “doomed” states. Namely, if the verifier chooses 
to move according to (M, 6) from a state in Dza, then the falsifier can force 
reaching a bad state for every choice of the verifier in the next steps of the game. 


Corollary 2. The set of CHCs in Fig.|6b) is satisfiable iff TS Kg y. 


Example 5. The example in Fig. [5] fits the case of finite branching if we assume 
that the integer values in the array A and those of sum and t are bounded modulo 
2™, and so are the labels L. This means that the falsifier has 2” possible steps at 
each game state, and the verifier has 3-2™ (3 is the number of possible schedules 
out of {1,2}). In the next subsection we explain how to encode the problem when 
the integers are considered to be unbounded. 


6.2 CHC Encoding of the Game with Infinite Branching 


The set of formulas in Fig.|6a] and the corresponding system of CHCs in Fig. 
is well defined when the set U is finite. However, if L is infinite, so is U. In this 
case, instead of using L*~! to specify the traces chosen by the verifier, we define 
a finite, abstract set of composed labels, denoted L}, to be used by the verifier 
(the falsifier will continue to use the concrete labels to specify his transitions of 
choice). Each abstract label in LË is a relational predicate p with free variables V 
(the composed vocabulary) that relates the states of different traces. Thus, the 
vector of individual existential choices 4#? of the verifier is now replaced with a 
single choice of a (relational) predicate p € L* over all the copies. In contrast to 
the use of concrete labels to specify the (unique) next transition for each trace 
individually, an abstract label p € LË? determines the next transitions for the 3 
traces by relating their post-states to the rest of the composed post-state. 

Specifically, given a set of labels 0” for the V traces and a schedule M, a 
predicate p € L is used as a restriction (inspired by the homonymous concept 
from [8]) of the transitions of the composed system according to schedule M 
with V-choices £, restricting the set of aforementioned transitions to those where 
the composed post-state satisfies p. 


Example 6. In Fig. |5| at line 13, a nondeterministic integer value is assigned 
to variable t. Since the set of integers is infinite, assigning a unique label £ to 
each integer results in an infinite set L. To specify the choices of the verifier, 
we therefore define a finite set of abstract labels. An example of such a set is 
LË = {sum , = sum, sum, = yo, sum, < yo, sum, = sum + Azļi2] + y2}.The 
restriction sum, = sumg can result in an empty set of transitions (we will return 
to this point later in the section); but the restrictions sum; = y2, sum, < y2 
and sum, = sum2+ Ag|i2|]+ y2 always define a nonempty set of transitions when 
pC, = 13 and when a schedule {2} C M is chosen: those transitions that choose a 
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value for y2 such that the predicate holds after the transition; there is always at 
least one such value. In fact, for swm, = y2 and sum, = sum2+ Ag|ta] + y2 there 
is exactly one such value, while for sum, < y2, the set of values (transitions) is 
infinite. Note that there are transitions that are not selected by any restriction 
(those that assign to y2 a value such that none of the predicates hold). 


Thus, the abstract labels define a space of underapproximations of the transi- 
tions of the composed system. This is an underapproximation since some (com- 
binations of) individual transitions of TS may not be allowed by any p € LË. 

The verifier uses p € LË to specify the transitions of the traces assigned 
to the existentially quantified variables 741..,. We then require that all of the 
composed post-states reached by the verifier’s choice (M, p) are winning for the 
verifier. This amounts to proving that all restricted traces satisfy Oo, which 
would mean that there exist traces that do, as long as the restrictions do not 
lead to an empty set of traces. Therefore, to ensure soundness of the encoding, 
we require that the restrictions be nonempty. Nonemptiness of the restrictions 
also ensures that the choices of the falsifier are never restricted, since the choices 
of the falsifier are always singletons (based on the concrete labels). 

Rather than limiting the set of predicates used as abstract labels, we ensure 
nonemptiness by applying the restrictions only when the resulting set of transi- 
tions is nonempty; otherwise, the full set of transitions is considered. Technically, 
this is accounted for by special considerations in the construction of the CHC 
encoding, as detailed below. 


CHC encoding We adapt the formulas in Fig. |6alto the case of abstract labels. 
We define U = M x LË . The formulas from T carry over, except that the 
definition of 6, 12 from the finite branching case is now replaced with ôm,p, which 
captures the transitions according to the abstract labels, as defined below. 

For a schedule @ + M C {1..k} and p € LË, we define allowed m p, a formula 
that is satisfied by 3,2” when some transition is possible from 3 with scheduling 
M and Y-choice Ê“ such that the target composed state satisfies p. This means 
that the restriction to p is nonempty. ÔM,p then applies the restriction of the 
composed post-state to p only when allowed (otherwise all transitions remain): 


A = 


allowed up(V,LY) = AV’, L4-Am(V,V’,L) Ap’) 

ôm pV, V, LY) = (GL? - Am(V,V',£L)) A (allowed m pY, LY) > p(V’)) 
The resulting encoding is sound, but, unlike the case of finite branching, not 
complete. 


Theorem 5. If the set of formulas in Fig. adapted to LË is satisfiable, then 
TS Fg 9. 


Example 7. Going back to the example in Fig. |5| choosing schedule M = {2} 
and restriction Æ# = (sum, = sum + Ag|i2] + y2) when pc = 13 ensures that 
the unique value of y2 that satisfies the restriction is selected. With this value 
chosen, the assignment of the next line will produce a value of sum2 that is 
equal to that of sum,. This gives rise to the following winning strategy (at every 
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iteration): (i) schedule {1} with any restriction until pc, = 7; (ii) schedule {2} 
until pe. = 13, then schedule {2} again with # = (sum, = sum2 + Aolie] + y2), 
then {2} again with any restriction; (iii) conclude the iteration by scheduling 
{1,2}. As explained, the inductive invariant sumı = sumg is preserved in this 
behavior, and there are no “stuck” states (since, by construction of ôm, p, empty 
restrictions are lifted to the full set of transitions). 


As a corollary of Thm. |5| satisfiability of the aforementioned formulas en- 
sures that TS — ọ. To obtain an equi-satisfiable CHC encoding, we apply the 
transformation of Sec./4} The resulting CHC encoding consists of the formulas in 
Fig. [6b] adapted to use LË in the same way the formulas in Fig. [6a] are adapted. 


Corollary 3. If the set of CHCs in Fig. adapted to L* is satisfiable, then 
TS Fg 9. 


7 Evaluation 


We implemented our CHC-encoding approach in a tool called HyHorn, on top of 
Z3 (4.12.0) through its Python API, using SPACER as a CHC solver. 
HyHorn takes as input a CFG, or several CFGs, whose transitions are annotated 
with two-vocabulary first-order formulas, and constructs a formula expressing 
the transition relation Tr. The specification is provided as: (i) a quantifier pre- 
fix VV, VJ, or V3, (ii) observation points €; and (iii) safety condition @ that 
must hold globally in all observations. From that, the CHC encoding (Sec. 
Sec. (6) is constructed and passed to SPACER for solving. HyHorn supports all 
first-order theories supported by SPACER (in our experiments, we used the the- 
ories of integer arithmetic and arrays). HyHorn further provides the option to 
apply predicate abstraction with a user-provided set of predicates, same as [8]. 
The abstraction is incorporated into the CHC encoding using the implicit ab- 
straction encoding . Notably, many of the benchmarks shown here are solved 
by HyHorn even without an abstraction, that is, directly over the concrete state. 

In the area of hyperproperty verification, there are already several tools 
present, and the objective of our evaluation is to compare with such. Still, the 
field is not mature enough to have a standardized specification format (as is the 
case with SMTLIB and SV-COMP, to name a few). As a result, each tool has 
its own, opinionated, format, which varies from logical formulas to control-flow 
graphs. This makes it technically difficult to compare results of multiple solu- 
tions. In particular, benchmarks taken from previous work come in a range of 
formats, dictated by the tools that introduced them. A few of the benchmarks 
were translated by previous authors and, thanks to their efforts, are available 
in more than one format. For the majority of them, manual work is required 
for translating the benchmarks, and, more importantly, there is no one accepted 
translation, and the translation can introduce artifacts in the evaluation. 

This forced us to prioritize the comparisons in our experiments. We chose 
to focus on comparison with the most closely related tools to our work. These 
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k-safety HyHorn |HyPA/PCSat|Ppsc yra HyHorn |HyPA 

PA concrete PA concrete 
double square NI 0.56 —| 67.0 = 6.8 non-det add 1.45 2.80 3.3 
double square NI ff 0.12 =| £3 1.5) 2 counter sum | 0.09 —| 40 
half square NI 0.30 0.30] 63.0] 13.4] 3.4 async GNI 0.36 0.37] 3.8 
squares sum 0.17 3.41] 70.4] 360.7} 2.8 compiler opt 1| 0.14 0.19 1.8 
squares sum (simplified)| 0.10 0.30] 17.2] “ | ~ compiler opt 2| 0.17 0.78] 2.0 
array insert 0.86 134| ~ 4 | 18.5 refine 0.18 0.29] 4.0 
array insert (simplified) | 1.33 2.58] 16.2] 378.6} ~ refine 2 0.28 0.65] 3.9 
exp1x3 0.08 0.09) 2.9) Z | 4 smaller 0.16 0.96] 2.0 
fig 3 [FV19] 0.03 =| 79) 47 | x counter diff | 0.17 —| 68 
fig 2 [BF 22] ol. =| 136) 7% | 4 fig 3 [BF22} |o.81  —| 99 
col item symm 0.49 0.49] 149) Z | < P1 (simple) |0.19 0.59] 1.4 
counter det 0.46 —| 62) Z | x P1 (GNI) 0.26 0.75 | 138.7 
mult equiv 0.29 —| 142) 7 | x P2 (GNI) 8.50 6.65] 12.8 
mult equiv (simplified) | 0.19 —| 10.3) Z | Z P3 (GNI) 0.32 0.20) 4.6 
array int mod 0.13 —| Z < | 58.2 P4 (GNI) 0.77 0.63] 27.7 
mult dist [FV19] 2.25 -|47 1,4 [4 


Table 1: Experimental results for k-safety properties. Time is measured in sec- 
onds. “—” represents timeouts after 20 minutes. “/” denotes benchmarks not 
present in the respective tool’s suite. 

In benchmark names, [FV19] refers to [27]; [BF 22] refers to [8]. 


are HyPA [8], Pdsc [41], and PCsat [45]. HyPA is the most recent tool, and has al- 
ready collected benchmarks from various previous papers (including Weaver [27]); 
Pdsc and PCsat both use the same first-order encoding as our starting point and 
thus are also relevant. HyPA’s benchmarks include, in particular, V*4* examples 
such as GNI, and Pdsc targets non-trivial alignments, and, as such all of its 
benchmarks have non-lockstep alignments. 


Benchmarks For the evaluation of our approach we use the full sets of bench- 
marks from HyPA [8] and Pdsc [41]. The benchmarks of HyPA are divided into 
k-safety benchmarks, which are adopted from [43]27]41]45|, and V*3* bench- 
marks, which include refinement properties for compiler optimizations, general 
refinement of nondeterministic programs and generalized non-interference (GNI). 
For two benchmarks, we include both a simplified version as given in [8], as well 
as the original example. The benchmarks of Pdsc include more non-lockstep ex- 
amples, as well as all of the comparator benchmarks from |43|. The comparator 
examples consist of both safe and unsafe instances. Weaver [27] considers 12 
additional (sequential) k-safety benchmarks. As an additional test case, we man- 
ually translated the running example from Weaver, which is a 3-safety property 
with a nontrivial alignment, and tested it with HyHorn — HyHorn solved it in 
2.25 seconds when provided with a few simple predicates (inequalities between 
program variables). We believe that being the running example makes it a good 
representative of the remaining 12. This brings our benchmark suite to a total 
of 112 k-safety examples (16 in Table [1] plus 96 comparator benchmarks). 


Experiments To demonstrate the effectiveness of HyHorn we compare to HyPA [8], 
the most recent approach of formal verification of V*4*-hyperproperties, which 
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employs a construction using automata. To exhibit the benefits of the direct CHC 
encoding we also compare the k-safety examples to PCSat [45] and Ppsc MI]. 
Both encode the k-safety problem using FOL formulas as in Fig. PCSat uses 
a specialized solver for pfwCSP (a fragment of FOL that includes these formu- 
las), while Ppsc solves the FOL formulas by enumerating alignments and using 
a CHC solver for each alignment. We do not compare to game solvers since, as 
reported by [8], state-of-the-art infinite-state game solvers, such as [26[2], which 
work without user-provided predicates, are unable to solve the benchmarks we 
consider. 

We run HyHorn on the full set of benchmarks, and each of the other tools on 
the ones included in their benchmark suite. This is because each tool has its own 
input format: HyPA and Ppsc each has its own representation for the transition 
system and the property; PCSat accepts pfwCSP instances that are constructed 
manually. Some of the benchmarks are common to several tools. 

All experiments are run on an AMD EPYC 74F3 with 32GB of memory. 
HyPA and PCSat are executed in Docker using their published artifactd'] 


Results The performance measurements of the tools for the k-safety benchmarks 
and for the Y*J* benchmarks are shown in Table[]] The results for the compara- 
tor examples are deferred to the extended version of the paper [384]. HyHorn 
is tested in two modes: with predicate abstraction (“PA”) and without (“con- 
crete”). HyPA and Ppsc require predefined predicates (the same predicates are 
used in all tools), while PCSat does not, but uses hints to solve ‘array insert’ 
and ‘squares sum’. HyHorn solves almost all of the benchmarks with PA in under 
a second, outperforming previous approaches by up to two orders of magnitude; 
and also solves most of the benchmarks quickly without PA, esp. the V*4* prop- 
erties. In particular, HyHorn solves the two array benchmarks, while HyPA and 
PCSat do not support arrays and only solve simplified versions with integers. 
The runtime of HyHorn (both with and without predicates) on the comparator 
examples is similar to the runtime of Ppsc (see [34]), where HyHorn solves some 
benchmarks that Ppsc does not. (The other tools do not include these bench- 
marks.) On the unsafe examples, HyHorn provides a concrete counterexample, 
while Ppsc is only able to determine that there is no inductive invariant and 
alignment expressible with the given set of predicates. 


8 Related Work 


There is a large body of work studying verification of hyperproperties. While ear- 
lier verification techniques mostly focus on k-safety properties, or specific exam- 
ples such as program equivalence, monotonocity, determinism |5/44)3/30/43/47) 
P427M], lately verification of non-safety hyperproperties has been studied 
[4[16]45]7]8]. Below we discuss the works closest to ours. 


T We evaluated HyHorn in Docker as well. There were no meaningful differences in 
runtime. 
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k-Safety Automatic verification of k-safety properties can be achieved by re- 
ducing the problem to a standard safety verification problem by means of self- 
composition [5], product-programs [8], and their derivatives [47J24]. Recently, 
however, it was identified that the alignment of the different copies has a sub- 
stantial effect over the complexity of the verification problem [41]27J12]. Our 
approach is most related to the technique of Shemer et al. [41], which uses a se- 
mantic alignment that chooses which copy of the system performs a move based 
on the composed state of the different copies. They suggest an algorithm that 
iterates through the set of possible semantic alignments, such that in each itera- 
tion a CHC solver tries to prove the property, with the chosen alignmnet, using 
predicate abstraction. Unlike [41], HyHorn delegates the search for the alignment 
to the CHC solver, together with the search for the invariant, making the algo- 
rithm less dependent on predicate abstraction. Moreover, while [41] is restricted 
to k-safety only, our technique can handle k-safety as well as the more general 
V*s*-OHyperLTL. 


y*J* Hyperproprties Recently, verification of V*i* hyperproperties has been 
studied, targeting both finite and infinite systems [45]16]8]. Unno et al. [45] 
present an approach based on an encoding of hyperproperties verification as 
satisfiability of formulas in FOL that extend Horn form with disjunctions, ex- 
istential quantification and well founded relations. Deciding satisfiability of the 
generated set of formulas is based on a variant of the CEGIS framework. Hy- 
Horn is different as it encodes V*l*-OHyperLTL verification as a set of CHCs, 
which does not require a specialized solver and can use any off-the-shelf CHC 
solver. Coenen et.al. suggested a game-based approach for verification of 
y*J* properties over finite-state systems, which was then extended by Beutner 
et al. [S] to handle infinite-state systems. Similarly to [8], we use game semantics 
to solve V*3* problems, but do not require building the game-graph in order 
to solve the game, instead reducing the game solution to satisfiability of CHCs. 
It is important to note that in the case of infinite branching degree, while the 
approach in explicitly checks for emptiness of restrictions in hindsight, i.e., 
after they are used in a strategy, and removes them iteratively if needed, HyHorn 
embeds the emptiness requirements into the set of CHCs. Recently, [7] extended 
the game-based approach to use prophecy variables as a way to achieve com- 
pleteness of the reduction to games. Extending our approach to this case is a 
promising avenue for future research. 


Relational CHCs [40] present a method for discovering relational solutions to 
CHCs. Their setting is different: the inputs are CHCs that serve as the definition 
of the transitions, and synchronization is between sets of unknown predicates; 
at the current state, only lock-step semantics is considered. Furthermore, their 
algorithm extends and modifies SPACER [35], while our approach can use any 
CHC solver without modification. 


Infinite-State Game Solving Our approach for verifying V*3* hyperproperties 
is based on the game semantics of Y*3*-OHyperLTL proposed in [16]8}]. How- 
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ever, we do not propose a general game solving algorithm. Instead, we use the 
game semantics to come up with a first-order encoding of hyperproperty verifi- 
cation problems, which is then reduced to CHC solving. This allows us to use 
any CHC solver when solving the hyperproperty game. There is a large body 
of work on solving infinite-state games |21/9/46]26|). The game solving approach 
in uses three-valued predicate abstraction to reduce the problem to finite- 
state game solving and requires to iteratively refine the controllable predecessor 
operator when computing candidate winning states. The approach in [26] tar- 
gets games defined over the theory of linear real arithmetic and is based on an 
unrolling of the game and the use of Craig interpolants to synthesize a win- 
ning strategy. The game solver in [2] is not restricted to a given FOL theory, but 
requires an interpolation procedure in order to compute sub-goals that are used 
to inductively split a game into sub-games. As reported by [8], game solving ap- 
proaches [262], which work without a provided set of predicates, are unable to 
handle the infinite-state games for the benchmarks we consider. Moreover, the 
approaches in |26]16/2/8} cannot handle games that are defined using formulas 
over the theory of arrays, which are part of our benchmark. The approach of [9] 
to solving games over infinite graphs is based on reduction of games (including 
safety games) to CHCs. However, unlike the reduction presented in this paper, 
in [9] the games are encoded in a different fragment of Horn, namely Va-Horn 
where the head predicates can contain existential quantifiers. More recently (and 
concurrently with our work), proposed a new reduction of game solving to 
CHC solving. Their approach handles safety games in which the branching de- 
gree of the “safe” player (the verifier in our setting) is bounded. In contrast, 
our encoding supports also infinite branching with the restrictions mechanism. 
Moreover, they do not support predicate abstraction, which is crucial for solving 
some of our benchmarks. 


Restrictions as Underapprozimations The use of restrictions as underapproxma- 
tions of the transition relation, inspired by [8], corresponds to the use of must 
hyper-transitions [36] in abstract transition systems [4219] and games [20]22]. 
Similarly to [29[17], we use such underapproximations to replace an existential 
quantifier by universal quantification within the restriction. 


9 Conclusion 


We introduced a translation of a family of non-Horn first-order formulas to 
CHCs. This translation led to the first CHC encoding of a simultaneous infer- 
ence of an invariant and an alignment for verifying k-safety properties. While 
the transformation itself is rather simple, identifying it was not straightforward 
and alluded previous works on the topic. We have further extended the CHC en- 
coding to infer a witness function for existentially quantified traces arising in the 
verification of V*4*-OHyperLTL properties. Our experiments exhibit significant 
improvement over state-of-the-art hyperproperty verifiers thanks to the existence 
of advanced off-the-shelf CHC solvers, whose efficacy is expected to improve even 
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further. The approach shows promising capabilities in solving (many) hyeprprop- 
erty verification problems completely automatically. In some cases, predicates 
still have to be provided by the user, a limitation that we hope to overcome 
in the future by automatic inference of predicates. Applying (or extending) the 
transformation to obtain CHC encoding for other verification fragments is an 
interesting direction for future work. 


Acknowledgment The research leading to these results has received funding 
from the European Research Council under the European Union’s Horizon 2020 
research and innovation programme (grant agreement No [759102-SVIS]). This 
research was partially supported by the Israeli Science Foundation (ISF) grant 
No. 2875/21 and No. 2117/23, and by the NSF-BSF grant No. 2018675. 


References 


1. ANTONOPOULOS, T., KOSKINEN, E., LE, T. C., NAGASAMUDRAM, R., NAUMANN, 
D. A., AND Nao, M. An algebra of alignment for relational verification. Proc. 
ACM Program. Lang. 7, POPL (jan 2023). 

2. BAIER, C., COENEN, N., FINKBEINER, B., FUNKE, F., JANTSCH, S., AND SIBER, 
J. Causality-based game solving. In Computer Aided Verification - 33rd Interna- 
tional Conference, CAV 2021, Virtual Event, July 20-23, 2021, Proceedings, Part I 
(2021), A. Silva and K. R. M. Leino, Eds., vol. 12759 of Lecture Notes in Computer 
Science, Springer, pp. 894-917. 

3. BARTHE, G., CRESPO, J. M., AND Kunz, C. Relational verification using product 
programs. In FM 2011: Formal Methods - 17th International Symposium on Formal 
Methods, Limerick, Ireland, June 20-24, 2011. Proceedings (2011), pp. 200-214. 

4. BARTHE, G., CRESPO, J. M., AND KuNz, C. Beyond 2-safety: Asymmetric product 
programs for relational program verification. In Logical Foundations of Computer 
Science, International Symposium, LFCS 2018, San Diego, CA, USA, January 
6-8, 2018. Proceedings (2013), S. N. Artémov and A. Nerode, Eds., vol. 7734 of 
Lecture Notes in Computer Science, Springer, pp. 29-43. 

5. BARTHE, G., D’ARGENIO, P. R., AND REzK, T. Secure information flow by self- 
composition. In 17th IEEE Computer Security Foundations Workshop, (CSFW-17 
2004), 28-80 June 2004, Pacific Grove, CA, USA (2004), pp. 100-114. 

6. BAUMEISTER, J., COENEN, N., BONAKDARPOUR, B., FINKBEINER, B., AND 
SANCHEZ, C. A temporal logic for asynchronous hyperproperties. In Computer 
Aided Verification - 38rd International Conference, CAV 2021, Virtual Event, 
July 20-23, 2021, Proceedings, Part I (2021), A. Silva and K. R. M. Leino, Eds., 
vol. 12759 of Lecture Notes in Computer Science, Springer, pp. 694-717. 

7. BEUTNER, R., AND FINKBEINER, B. Prophecy variables for hyperproperty ver- 
ification. In 35th IEEE Computer Security Foundations Symposium, CSF 2022, 
Haifa, Israel, August 7-10, 2022 (2022), IEEE, pp. 471-485. 

8. BEUTNER, R., AND FINKBEINER, B. Software verification of hyperproperties be- 
yond k-safety. In Computer Aided Verification - 34th International Conference, 
CAV 2022, Haifa, Israel, August 7-10, 2022, Proceedings, Part I (2022), S. Shoham 
and Y. Vizel, Eds., vol. 13371 of Lecture Notes in Computer Science, Springer, 
pp. 341-362. 


238 
9. 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


Shachar Itzhaky, Sharon Shoham, and Yakir Vizel 


BEYENE, T. A., CHAUDHURI, S., POPEEA, C., AND RYBALCHENKO, A. A 
constraint-based approach to solving games on infinite graphs. In The 41st Annual 
ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 
POPL 714, San Diego, CA, USA, January 20-21, 2014 (2014), S. Jagannathan 
and P. Sewell, Eds., ACM, pp. 221-234. 

BJ@RNER, N., GURFINKEL, A., MCMILLAN, K. L., AND RYBALCHENKO, A. Horn 
clause solvers for program verification. In Fields of Logic and Computation II - 
Essays Dedicated to Yuri Gurevich on the Occasion of His 75th Birthday (2015), 
pp. 24-51. 

BJØRNER, N. S., MCMILLAN, K. L., AND RYBALCHENKO, A. On solving univer- 
sally quantified horn clauses. In Static Analysis - 20th International Symposium, 
SAS 2018, Seattle, WA, USA, June 20-22, 2013. Proceedings (2013), F. Logozzo 
and M. Fahndrich, Eds., vol. 7935 of Lecture Notes in Computer Science, Springer, 
pp. 105-125. 

CHURCHILL, B. R., PADON, O., SHARMA, R., AND AIKEN, A. Semantic program 
alignment for equivalence checking. In Proceedings of the 40th ACM SIGPLAN 
Conference on Programming Language Design and Implementation, PLDI 2019, 
Phoenix, AZ, USA, June 22-26, 2019 (2019), K. S. McKinley and K. Fisher, Eds., 
ACM, pp. 1027-1040. 

CIMATTI, A., GRIGGIO, A., MOVER, S., AND TONETTA, S. IC3 modulo theories via 
implicit predicate abstraction. In Tools and Algorithms for the Construction and 
Analysis of Systems - 20th International Conference, TACAS 2014, Held as Part 
of the European Joint Conferences on Theory and Practice of Software, ETAPS 
2014, Grenoble, France, April 5-13, 2014. Proceedings (2014), E. Ábrahám and 
K. Havelund, Eds., vol. 8413 of Lecture Notes in Computer Science, Springer, 
pp. 46-61. 

CLARKSON, M. R., FINKBEINER, B., KOLENI, M., MICINSKI, K. K., RABE, 
M. N., AND SÁNCHEZ, C. Temporal logics for hyperproperties. In Principles of 
Security and Trust - Third International Conference, POST 2014, Held as Part of 
the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, 
Grenoble, France, April 5-18, 2014, Proceedings (2014), M. Abadi and S. Kremer, 
Eds., vol. 8414 of Lecture Notes in Computer Science, Springer, pp. 265-284. 
CLARKSON, M. R., AND SCHNEIDER, F. B. Hyperproperties. J. Comput. Secur. 
18, 6 (2010), 1157-1210. 

COENEN, N., FINKBEINER, B., SANCHEZ, C., AND TENTRUP, L. Verifying hyper- 
liveness. In Computer Aided Verification - 31st International Conference, CAV 
2019, New York City, NY, USA, July 15-18, 2019, Proceedings, Part I (2019), 
I. Dillig and S. Tasiran, Eds., vol. 11561 of Lecture Notes in Computer Science, 
Springer, pp. 121-139. 

Cook, B., AND KOSKINEN, E. Reasoning about nondeterminism in programs. 
SIGPLAN Not. 48, 6 (jun 2013), 219-230. 

CRAIG, W. Three Uses of the Herbrand-Gentzen Theorem in Relating Model 
Theory and Proof Theory. J. of Symbolic Logic 22, 3 (1957), 269-285. 

Dams, D., AND NAMJOSHI, K. S. The existence of finite abstractions for branch- 
ing time model checking. In 19th IEEE Symposium on Logic in Computer Science 
(LICS 2004), 14-17 July 2004, Turku, Finland, Proceedings (2004), IEEE Com- 
puter Society, pp. 335-344. 

DE ALFARO, L., GODEFROID, P., AND JAGADEESAN, R. Three-valued abstractions 
of games: Uncertainty, but with precision. In 19th IEEE Symposium on Logic 
in Computer Science (LICS 2004), 14-17 July 2004, Turku, Finland, Proceedings 
(2004), IEEE Computer Society, pp. 170-179. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


3l. 


32. 


Hyperproperty Verification as CHC Satisfiability 239 


DE ALFARO, L., HENZINGER, T. A., AND MAJUMDAR, R. Symbolic algorithms for 
infinite-state games. In CONCUR 2001 - Concurrency Theory, 12th International 
Conference, Aalborg, Denmark, August 20-25, 2001, Proceedings (2001), K. G. 
Larsen and M. Nielsen, Eds., vol. 2154 of Lecture Notes in Computer Science, 
Springer, pp. 536-550. 

DE ALFARO, L., AND Roy, P. Solving games via three-valued abstraction re- 
finement. In CONCUR 2007 - Concurrency Theory, 18th International Confer- 
ence, CONCUR 2007, Lisbon, Portugal, September 3-8, 2007, Proceedings (2007), 
L. Caires and V. T. Vasconcelos, Eds., vol. 4703 of Lecture Notes in Computer 
Science, Springer, pp. 74-89. 

DE Moura, L. M., AND BJØRNER, N. Z3: an efficient SMT solver. In Tools 
and Algorithms for the Construction and Analysis of Systems, 14th International 
Conference, TACAS 2008, Held as Part of the Joint European Conferences on 
Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29- 
April 6, 2008. Proceedings (2008), pp. 337-340. 

EILERS, M., MULLER, P., AND HITZ, S. Modular product programs. In Pro- 
gramming Languages and Systems - 27th European Symposium on Programming, 
ESOP 2018, Held as Part of the European Joint Conferences on Theory and Prac- 
tice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings 
(2018), pp. 502-529. 

FAELLA, M., AND PARLATO, G. Reachability games modulo theories with a 
bounded safety player. Proceedings of the AAAI Conference on Artificial Intel- 
ligence 37, 5 (June 2023), 6330-6337. 

FARZAN, A., AND KINCAID, Z. Strategy synthesis for linear arithmetic games. 
Proc. ACM Program. Lang. 2, POPL (2018), 61:1-61:30. 

FARZAN, A., AND VANDIKAS, A. Automated hypersafety verification. In Computer 
Aided Verification - 81st International Conference, CAV 2019, New York City, NY, 
USA, July 15-18, 2019, Proceedings, Part I (2019), I. Dillig and S. Tasiran, Eds., 
vol. 11561 of Lecture Notes in Computer Science, Springer, pp. 200-218. 
FEDYUKOVICH, G., KAUFMAN, S. J., AND Bopfk, R. Sampling invariants from 
frequency distributions. In 2017 Formal Methods in Computer Aided Design, FM- 
CAD 2017, Vienna, Austria, October 2-6, 2017 (2017), D. Stewart and G. Weis- 
senbacher, Eds., IEEE, pp. 100-107. 

GODEFROID, P., Nori, A. V., RAJAMANI, S. K., AND TETALI, S. Composi- 
tional may-must program analysis: unleashing the power of alternation. In Pro- 
ceedings of the 87th ACM SIGPLAN-SIGACT Symposium on Principles of Pro- 
gramming Languages, POPL 2010, Madrid, Spain, January 17-28, 2010 (2010), 
M. V. Hermenegildo and J. Palsberg, Eds., ACM, pp. 43-56. 

GODLIN, B., AND STRICHMAN, O. Regression verification: proving the equivalence 
of similar programs. Softw. Test. Verification Reliab. 28, 3 (2013), 241-258. 
GURFINKEL, A. Program verification with constrained horn clauses (invited paper). 
In Computer Aided Verification - 34th International Conference, CAV 2022, Haifa, 
Israel, August 7-10, 2022, Proceedings, Part I (2022), S. Shoham and Y. Vizel, 
Eds., vol. 13371 of Lecture Notes in Computer Science, Springer, pp. 19-29. 
HODER, K., AND BJØRNER, N. S. Generalized property directed reachability. In 
Theory and Applications of Satisfiability Testing - SAT 2012 - 15th International 
Conference, Trento, Italy, June 17-20, 2012. Proceedings (2012), A. Cimatti and 
R. Sebastiani, Eds., vol. 7317 of Lecture Notes in Computer Science, Springer, 
pp. 157-171. 


240 
33. 


34. 


35. 


36. 


37. 


38. 


39. 


40. 


41. 


42. 


43. 


44. 


45. 


Shachar Itzhaky, Sharon Shoham, and Yakir Vizel 


HojjaT, H., AND RUMMER, P. The ELDARICA horn solver. In 2018 Formal 
Methods in Computer Aided Design, FMCAD 2018, Austin, TX, USA, October 30 
- November 2, 2018 (2018), N. S. Bjørner and A. Gurfinkel, Eds., IEEE, pp. 1-7. 
ITZHAKY, S., SHOHAM, S., AND VIZEL, Y. Hyperproperty verification as chc 
satisfiability. Available at 
KOMURAVELLI, A., GURFINKEL, A., AND CHAKI, S. SMT-based model check- 
ing for recursive programs. In Computer Aided Verification - 26th International 
Conference, CAV 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, 
Vienna, Austria, July 18-22, 2014. Proceedings (2014), pp. 17-34. 

LARSEN, K. G., AND Liu, X. Equation solving using modal transition systems. In 
Proceedings of the Fifth Annual Symposium on Logic in Computer Science (LICS 
90), Philadelphia, Pennsylvania, USA, June 4-7, 1990 (1990), IEEE Computer 
Society, pp. 108-117. 

Lewis, H. R. Renaming a set of clauses as a horn set. J. ACM 25, 1 (1978), 
134-135. 

McCuL.LouGcu, D. Noninterference and the composability of security properties. 
In Proceedings of the 1988 IEEE Symposium on Security and Privacy, Oakland, 
California, USA, April 18-21, 1988 (1988), IEEE Computer Society, pp. 177-186. 
McMILLAN, K. L. Lazy annotation revisited. In Computer Aided Verification - 
26th International Conference, CAV 2014, Held as Part of the Vienna Summer of 
Logic, VSL 2014, Vienna, Austria, July 18-22, 2014. Proceedings (2014), A. Biere 
and R. Bloem, Eds., vol. 8559 of Lecture Notes in Computer Science, Springer, 
pp. 243-259. 

MorpDvInov, D., AND FEDYUKOVICH, G. Property directed inference of relational 
invariants. In 2019 Formal Methods in Computer Aided Design, FMCAD 2019, 
San Jose, CA, USA, October 22-25, 2019 (2019), C. W. Barrett and J. Yang, Eds., 
IEEE, pp. 152-160. 

SHEMER, R., GURFINKEL, A., SHOHAM, S., AND VIZEL, Y. Property directed 
self composition. In Computer Aided Verification - 31st International Conference, 
CAV 2019, New York City, NY, USA, July 15-18, 2019, Proceedings, Part I (2019), 
I. Dillig and S. Tasiran, Eds., vol. 11561 of Lecture Notes in Computer Science, 
Springer, pp. 161-179. 

SHOHAM, S., AND GRUMBERG, O. Monotonic abstraction-refinement for CTL. In 
Tools and Algorithms for the Construction and Analysis of Systems, 10th Interna- 
tional Conference, TACAS 2004, Held as Part of the Joint European Conferences 
on Theory and Practice of Software, ETAPS 2004, Barcelona, Spain, March 29 - 
April 2, 2004, Proceedings (2004), K. Jensen and A. Podelski, Eds., vol. 2988 of 
Lecture Notes in Computer Science, Springer, pp. 546-560. 

Sousa, M., AND DILLIG, I. Cartesian hoare logic for verifying k-safety properties. 
In Proceedings of the 87th ACM SIGPLAN Conference on Programming Language 
Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 18-17, 
2016 (2016), pp. 57-69. 

TERAUCHI, T., AND AIKEN, A. Secure information flow as a safety problem. In 
Static Analysis, 12th International Symposium, SAS 2005, London, UK, September 
7-9, 2005, Proceedings (2005), pp. 352-367. 

Unno, H., TERAUCHI, T., AND KOSKINEN, E. Constraint-based relational ver- 
ification. In Computer Aided Verification - 33rd International Conference, CAV 
2021, Virtual Event, July 20-23, 2021, Proceedings, Part I (2021), A. Silva and 
K. R. M. Leino, Eds., vol. 12759 of Lecture Notes in Computer Science, Springer, 
pp. 742-766. 


46. 


47. 


48. 


49. 


Hyperproperty Verification as CHC Satisfiability 241 


WALKER, A., AND RYZHYK, L. Predicate abstraction for reactive synthesis. In 
Formal Methods in Computer-Aided Design, FMCAD 2014, Lausanne, Switzer- 
land, October 21-24, 2014 (2014), IEEE, pp. 219-226. 

YANG, W., VIZEL, Y., SUBRAMANYAN, P., GUPTA, A., AND MALIK, S. Lazy 
self-composition for security verification. In Computer Aided Verification - 30th 
International Conference, CAV 2018, Held as Part of the Federated Logic Con- 
ference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part II (2018), 
H. Chockler and G. Weissenbacher, Eds., vol. 10982 of Lecture Notes in Computer 
Science, Springer, pp. 136-156. 

ZAKS, A., AND PNUELI, A. Covac: Compiler validation by program analysis of the 
cross-product. In FM 2008: Formal Methods, 15th International Symposium on 
Formal Methods, Turku, Finland, May 26-30, 2008, Proceedings (2008), pp. 35- 
51. 

ZHU, H., MAGILL, S., AND JAGANNATHAN, S. A data-driven CHC solver. In 
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language 
Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018 
(2018), J. S. Foster and D. Grossman, Eds., ACM, pp. 707-721. 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 


or 


format, as long as you give appropriate credit to the original author(s) and the 


source, provide a link to the Creative Commons license and indicate if changes were 
made. 


The images or other third party material in this chapter are included in the 


chapter’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter’s Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 


Program Analysis 


Check for 
updates 


45) 


Maximal Quantified Precondition Synthesis for 
Linear Array Loops 


Sumanth Prabhu $2), Grigory Fedyukovich?“©), and Deepak D’Souza?(™) 


1 Tata Consultancy Services Research, Pune, India 
2 Indian Institute of Science, Bengaluru, India 
3 Florida State University, Tallahassee, USA 
sumanth.prabhu@tcs.com, grigory@cs.fsu.edu, deepakd@iisc.ac.in 


Abstract. Precondition inference is an important problem with many 
applications in verification and testing. Finding preconditions can be 
tricky as programs often have loops and arrays, which necessitates find- 
ing quantified inductive invariants. However, existing techniques have 
limitations in finding such invariants, especially when preconditions are 
missing. Further, maximal (or weakest) preconditions are often required 
to maximize the usefulness of preconditions. So the inferred inductive 
invariants have to be adequately weak. To address these challenges, we 
present an approach for maximal quantified precondition inference using 
an infer-check-weaken framework. Preconditions and inductive invari- 
ants are inferred by a novel technique called range abduction, and then 
checked for maximality and weakened if required. Range abduction at- 
tempts to propagate the given quantified postcondition backwards and 
then strengthen or weaken it as needed to establish inductiveness. Weak- 
ening is done in a syntax-guided fashion. Our evaluation performed on a 
set of public benchmarks demonstrates that the technique significantly 
outperforms existing techniques in finding maximal preconditions and 
inductive invariants. 


1 Introduction 


Many practical problems in software development, verification, and testing rely 
on good and nontrivial preconditions for programs. Preconditions can be consid- 
ered as a constraint on a program’s input or used to filter out input values of a 
program at run-time. While performing verification in a backward fashion, pre- 
conditions are used to summarize loops and functions. The mazimal (or logically 
weakest) precondition is desirable in all these applications. Such preconditions 
can be derived in various methods [45] 4]5411 3)53]2571314¢). 

However, precondition inference is known to be difficult for programs with 
unbounded loops, as it requires reasoning about possible behaviors in any any 
loop iteration. This necessitates the inference of inductive invariants that de- 
scribe a set of states from which a new iteration can begin and cannot escape. 
This task becomes particularly challenging in the presence of data structures 
like arbitrarily-sized arrays. When reasoning about array elements, solvers are 
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Fig. 1: An overview of our infer-check-weaken framework. 


expected to support quantifiers, but existing techniques [30,40,34,32,22] have 
many limitations. 

We present a new technique to automatically infer maximal quantified pre- 
conditions for deterministic programs that manipulate arrays and have linear ar- 
ray loops. These loops are non-nested and terminating loops with unique counter 
variables. The postconditions can have either universal or existential quantifica- 
tion. Since such programs can model many practical programs, several techniques 
target them, but for assertion checking and not precondition inference [39,10]. 
Moreover, we show in this paper that precondition inference is undecidable for 
this class of programs. 

An overview of our algorithm is shown in Fig. 1. The algorithm operates in 
an “infer-check-weaken” framework. Our algorithm views the problem as solving 
a system of constrained Horn clauses (CHCs), which are logical systems to rep- 
resent the verification conditions of programs, with a missing precondition pre. 
A valid solution for this system is inferred by an abduction-based algorithm, i.e., 
by systematically answering the questions like “what state at the beginning of 
the iteration could yield a given state at the end of the iteration?” The solution 
is then checked for maximality by inferring another precondition (cpre) for a 
system that uses the same CHC encoding of the loop and the complemented 
(i.e., negated) postcondition. If the solution is not maximal, it is weakened in- 
crementally in a counter-example-guided loop. 

The inference algorithm begins with the weakest possible candidate solution 
and propagates the given quantified postcondition towards the program’s en- 
try point. In the process, it strengthens the candidate solution using our novel 
technique called range abduction. Range abduction finds a strengthening of quan- 
tified formulas by reduction (wherever possible) to abduction over quantifier-free 
formulas. The obtained formulas are combined with the range formula [22] that 
essentially represents a boundary between the indices of arrays that are already 
processed and indices that are yet to be processed. Such a predicate can be 
obtained using lightweight static analysis over the structure of the CHCs. The 
inferente algorithm uses the HOUDINI technique [24] to weaken a solution. 

Intuitively, range abduction for linear array loops seeks to pose two integer 
abduction queries over indices that are modified and indices that are not modi- 
fied. Integer abduction has been used in invariant inference [17,18], precondition 
inference [27,16], and specification synthesis [2,50]. On the lower level, abduction 
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is often implemented using quantifier elimination, but in our setting the formu- 
las must use quantifiers over array indices that should not be eliminated. Range 
abduction is designed specifically for this application. 

Although efficient, range abduction does not guarantee maximality, and our 
inference is followed by two additional steps: maximality checking and weakening. 
The maximality checker tries to determine whether all the states outside the 
current precondition lead to a violation of the assertion. If they do, the current 
precondition is maximal. Otherwise, there is at least one state that can be added 
to the current precondition, and hence an attempt to weaken the precondition 
is made. The weakening module weakens a precondition and infers an inductive 
invariant for it using a syntax-guided-synthesis based method. 

A prior framework to find specifications (including preconditions) follows 
a similar approach by iteratively inferring solutions. But it is based on integer 
abduction and a maximality checking using an SMT solver, and it is applicable 
only to array-free programs. Furthermore, it does not guarantee maximality 
in some cases [29]. We experimentally observed that extending the SMT-based 
maximality checking algorithm of with quantified formulas over arrays makes 
the tool diverge. This motivated us to design a new maximality checker by using 
cpre of the complement system and range abduction. 

We have implemented our algorithm in a tool called PREQSYN, which takes 
CHCs as input. On a challenging set of 32 benchmarks, PREQSYN significantly 
outperforms a prior maximal quantified precondition inference tool P-GEN [54]. 
PREQSYN automatically found 31 preconditions and proved 21 of them to be 
maximal, while in contrast P-GEN found only 2 maximal preconditions and in 
most cases did not find any preconditions. We also show that a variety of existing 
array verification tools like VERIABS [15], SPACER [82], and FREQHorRN [23], 
find it hard to even verify the preconditions we discovered for these benchmarks. 
Our tool can not only solve them by finding preconditions, but also finds the 
maximal ones in most of the cases. 

The core contributions of this paper are: 


1. An algorithm, based on a new technique called range abduction, to infer 
universal and existential quantified preconditions and invariants, effective 
on linear array loop programs. 

2. New methods to check maximality and weaken preconditions. 

3. A tool that implements the algorithms to infer maximal preconditions and 
can be used as a CHC solver. 

4. Experimental evaluation demonstrating the effectiveness of the algorithm. 


In the rest of the paper, we motivate the problem with an example in Sect. 
The necessary background for abduction and CHCs are provided in Sect. |3| A 
proof of undecidability of the problem is in Sect. [A] Sect. [5]presents an overview 
of our inference algorithm and an illustration on the example. The details of 
range abduction are in Sect. [6] while Sect. [7| has the maximality checking and 
weakening algorithms. Our experimental evaluation can be found in Sect. 
related work in Sect. p and conclude with limitations and future work in Sect. 
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int N = nondetInt(); 
int A[N], BIN], CI(N]; 
assume(pre(A, B, C, N)); // goal: find maximal pre 
for (int i = 0; i < N; i++) 
if (2*i < N) C[i] = i; 
else Afi] = Cli]; 
assert (Vj. 0 < j < N = A[j] == B[j]); // postcondition 


Fig. 2: C-like example with a universally quantified postcondition and no pre- 
condition. 


pre(N, A, B,C) Ai = 0 = inv(i,N,A,B,C) (C1) 
inv(i,N,A,B,C) Ai< NA2*i<NAC'= store(C,i,i) Ai =i+1 => inv(i',N,A,B,C’) (C2) 
inv(i,N,A,B,C) ^i < NA2*i>NA A'= store(A, i, Cli) Ai! =i+1 => inv(i',N,A’,B,C) (C3) 
inv(i,N,A,B,C) AnG<N)AA(Vj.0< 9 <N A[j] = Biy]) l (C4) 
Fig. 3: CHC encoding of program in Fig. 
int N = nondetInt(); 
int A[N], BIN], C[N]; 
assume(cpre(A, B, C, N)); 
for (int i = 0; i < N; i++) 
if (2*i < N) C[i] = i; 
else Afi] = C[i]; 
assert (Jj. 0 < j < NA A[j] != B[j]); // complemented post 


Fig. 4: The program used to check maximality; the postcondition is comple- 
mented and has no precondition. 


2 Motivating Example 


We motivate the problem with the program shown in Fig. P] with three finite- 
length statically allocated arrays A, B, and C, each of the size N. The arrays 
are accessed sequentially in the loop: the cells in the first half of C are assigned 
their corresponding indices, and the remaining elements of C are copied to the 
corresponding positions in A. The program ends with the postcondition stating 
the pairwise equality of A and B. Our goal is to find the maximal precondition 
under which the postcondition holds. Intuitively, such a precondition must be 
universally quantified because it must express that arrays A, B, and C are 
properly initialized up to an arbitrary length N. 

Further, in order to prove that the postcondition indeed holds after the loop 
has terminated, we have to show that there exists an inductive invariant that 
is also universally quantified. To confirm that the precondition is logically the 
weakest, we need to formally prove that any attempt to extend it by a single 
point leads to a violation of the postcondition. Thus, the solution we target 
should have two properties: 1) it should allow us to find an inductive invariant 
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for the loop, and 2) any of its weakening results in a counterexample that violates 
the assertion. 

The only publicly existing tool to find quantified precondition, P-GEN [54], 
which is based on predicate abstraction, is unable to solve this program. The 
last candidate precondition it tries to refine is N = 3A A[0] = B[0] A A[1] = 
B(1] A A[2] = B[2], which does not constrain the value of array C', thus allowing 
the program to violate the postcondition, e.g., when C[2] #4 B[2] initially and 
A[2] = C[2] in the else-branch when i = 2. 

Fig. [3] shows a system of CHCs over relations pre and inv, representing the 
verification conditions of the program in Fig. |2| For brevity, we do not mention 
the universal quantification over all program variables including arrays, which 
is implicit. In particular, the first CHC identifies the initial value of the counter 
but does not give any constraints over A, B, or C (which are essentially deferred 
to pre). The next two CHCs encode the loop body, corresponding to the two 
possible branches in the body of the loop. The last CHC encodes that no state 
satisfying the negation of the assertion is reachable. 

The missing precondition makes the CHC system in Fig. [3] different from the 
CHC systems that appear in verification tasks. Hence, existing CHC solvers are 
not directly applicable here as they can return the strongest solution: L. For 
instance, SPACER (Z3 v4.12.2) returns the solution pre +> AN, A, B,C. L 
and inv +> Xi, N, A, B,C. L. Such vacuous solutions are not of much use in the 
applications mentioned earlier. 

The CHC system also represents a maximal specification problem, with pre 
being the specification of an initialization function. However, existing maximal 
precondition synthesis techniques do not support synthesizing quanti- 
fied preconditions over arrays. 

Our algorithm takes the input CHC system and works in an infer-check- 
weaken fashion as shown in Fig First, the infer module strengthens and 
weakens the postcondition from the last CHC via range abduction and Hou- 
DINI, resp., to find the following precondition (detailed illustration follows in 


Sect. [5.2): 


AN, A, B,C.Yj(O0LSj< NA 2x j <N => Alj]=BlLj]) AVI(OSG<NA2*j7>N => Bij] =Clhy)). 


We note that this is the maximal precondition for this problem instance, 
and in general we may not always find the maximal precondition in the first 
iteration. In any case, we need to check the maximality of the inferred precon- 
dition. Our maximality checker does this by trying to find a precondition for 
the complement of the postcondition (called the “complement program”, see 
Fig |4). This is achieved by calling the infer module again, albeit with an exis- 
tentially quantified postcondition. By using the existentially quantified structure 
of the postcondition, the infer module discovers the following precondition (see 
Sect. (for details): 


AN, A, B,C.3j(0<j < NA 2 * j> NAB[|j]# Clil). 
The maximality checker now tries to determine whether all the points that are 


outside the precondition of the original program are indeed in the precondition 
of the complement program. For the example program, this is encoded as the 
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following formula: 


VA, B, N. (=(Yj(O0<j < NA 2 * j< N => Afj]= B[j]) AVIS I< NA 2% j> N= Bij] =Clhy])) 
= Jj(O0<Lj<NA 2x j> NAB[|j]#C[j])). 


If the formula was valid, the current precondition would be maximal since all the 
states outside would violate the property (as they would be in the precondition of 
the complement program). In this example, the implication is not valid, because 
N = 3, A = [0,0,0], B = [1,0,0], C = [0,0,0] is a counter-example to validity. 
Our approach then weakens the precondition of the complement program based 
on the counterexample to the validity check: 


AN, A, B,C. 3j(0< J < NA 2* j> N => B|j]# Cli) VAIO< F< NA2*j<NAALJABLI). 


The checker now conducts a successful validity check, and the algorithm termi- 
nates. 


3 Background 


This paper builds largely on foundations of Satisfiability Modulo Theories (SMT) 
problems. SMT aims to determine the existence of an assignment to variables of 
a first-order logic formula that makes it true. We will be dealing with the logical 
setting £ of linear integer arithmetic (LIA) with arrays. The signature of the 
logic includes a finite set of uninterpreted relation symbols R. Each symbol r 
in ® has an associated arity a, and an associated type which indicates a type 
(integer or array) for each argument of the relation. 

We write y(a1,...,2n) (where each z; is a variable with an associated in- 
teger/array type) to denote a formula ọ of this logic, that does not use any of 
the relation symbols in R, and whose free variables are among {x1,...,2,}. For 
convenience, we also write y(Z) to denote the same. For a formula y(Z), and 
an assignment m which maps the variables in 7 to concrete integers/arrays, we 
write m = ọ to denote that y evaluates to T under m, and say m satisfies y, 
or that m is a model of y. A formula w is logically weaker than a formula ~ 
(denoted y => y), if every model of y also satisfies y. Hence y => L 
denotes that y is unsatisfiable. 

An interpretation for a relation symbol r € R is defined as a map of the form 
AL... ATn.(L1,.--,;Ln), where ọ is a well-typed first-order formula that does 
not contain any symbols from R. 

We now present formal definitions of the concepts that will be used in the 
rest of the paper. 


3.1 Abduction 


Definition 1. Let £ and ¥ be vectors of variables such that the variables in £ 
are also present in y. Let a(y) and B(y) be formulas without any relation symbols 
from R, with free variables in y. Let r be an uninterpreted relation in R, of arity 
equal to the length of £. Consider a formula of the form r(Z) ^al) => L(y). 
The abduction problem is to find an interpretation p forr, such that: 
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1. p(z) Aaly) = L, and 
2. p(#) Naly) => By). 


Intuitively, the problem of abduction is to find a formula y that together with 
a entails the formula @ in a non-trivial manner. One can see that for a given 
abduction problem there may be multiple solutions, but we are interested in a 
maximal one (i.e. logically weakest), whenever a solution exists. The techniques 
in [77J16]2] compute such a maximal solution for first order theories that admit 
quantifier elimination. This solution is succinctly presented in the lemma below. 


Lemma 1. Let r(#)Aa(y) = > L(Y) be an abduction problem where the under- 
lying first order theory has a method QE(g,w) whose result is a g-free formula 
constructed by (existential) quantifier elimination of variables ¢ from the formula 
wy. Suppose that the given instance has a solution. Then, the following formula 
p(T) forms a maximal solution for the abduction problem: 


def 


9(@) = *(QE(Y\ Z, a A +8). 


Example 1. Consider an instance of the abduction problem r(x) Ay = 0 => 
x > y. Then y(x) is computed as follows: 


plz) = ~(QE({z,y} \ {r} y =0A 7 < y)) = 
A(QE(y,y =O0Az<y)) =7A(z <0) =2>0. 


3.2 Modeling Programs With Constrained Horn Clauses 
Constrained Horn clauses (CHCs) are becoming in- 


creasingly popular as an intermediate logical representation of programs and 
their proof obligations. Dealing directly with CHCs as opposed to program state- 
ments is convenient and allows for easier creation and handling of various SMT 
formulas and constructed invariants. 


Definition 2. A CHC (in the logic L) is a formula in £ that has the form of 
one of the following three implications: 


Vari. (91 (21) => 71(£1)) (1) 
VZ, T2 (rı (Z1) A p2(Z1, Z2) => ro(X2)) (2) 
Vi. (r1(Z1) A y3(%1) => L) (3) 


where: 


— 11,72 E R are uninterpreted relation symbols, where rı and rg may coincide. 
— £, č) are vectors of variables; 

— the vectors £, and #2 have no common elements, and 

— the formulas pi, called constraints, have no uninterpreted symbols from R. 


We introduce some auxiliary notation below for convenience. For a CHC C: 
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— body(C) (resp. head(C)) denotes the left (resp. right) side of the implication 

in C, 

rel(body(C)) denotes the (singleton or empty) set of relation symbols in R, 

that appear in body(C), 

— When C is of type (2), rel(head(C)) denotes the singleton set {r2} containing 
the relation appearing in head(C), 

— When C is of type (2), args(body(C)) (a.k.a. source variables) denotes xı 
and args(head(C)) (a.k.a. destination variables) denotes T2. 


A CHC of type is called a fact, and of type is called a query. For 
simplicity, for a query C, we write rel(head(C)) = rel( L) = L. In the literature, 
the CHCs we are considering are called linear as there is at most one relation 
symbol in the body of a CHC. A system of CHCs is a finite non-empty set of 
CHCs. 

We assume that our precondition inference problem is represented by a sys- 
tem of CHCs S without any fact] and there is a designated relation pre (or 
cpre) that appears in rel(body(C)) for some CHC C in S and doesn’t appear 
in rel(head(C’)) for any other CHC C” in S. Furthermore, we assume that there 
is a single query in S with a constraint of the form y A p, where =p is the 
postcondition in the inference problem. 

CHCs allow for flexibility of program encoding. For instance, it is safe to 
assume that each yc is in Conjunctive Normal Form (CNF). For if C had the 
following form: 


r1(#1) A (yi (£1, £2) V po(#1,%2)) => r2(z2), 


it can be transformed into two CHCs: 


Definition 3 (CHC Solution and Satisfiability). A solution to a system of 
CHCs S is a map M that provides an interpretation for each relation symbol in 
R, such that for each CHC C in S, (body(C) => head(C))[M/ RF] is valid. 
In this case we say M is inductive at C. We say S is satisfiable if there exists a 
solution to it. 


Definition 4 (Maximal Precondition). Let S be a system of CHCs for a pre- 
condition inference problem. We call a solution M to S (precondition) maximal 
if there is no solution M’ to S with M'(pre) strictly logically weaker (i.e. w.r.t. 
the implication partial order) than M(pre). M(pre) is also called the weakest 
precondition. 


4 A fact CHC represents the initial condition of the program. Since pre is in the place 
of initial condition in our task, there will not be a fact CHC. 

5 For a formula y, terms/formulas a and b, we write y[b/a] to denote ọ after all 
instances of a are replaced by b. For a set of terms/formulas X and a mapping M 
from X to other terms/formulas, p[M/X] denotes the simultaneous replacement of 
all 11, 2%2,...€ X by M(a1), M(ax2),..., respectively. 
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We now define certain terms that will be used in weakening of a precondition 
(Sect. F). 


Definition 5 (Complement System). Given a system of CHCs S, we define 
a complement system S to be the system obtained from S by replacing p by 7p 
in the query CHC. 


Definition 6 (CHC Extension). Given a system of CHCs S with pre and 
an interpretation p for pre, we define S,, an extension of S w.r.t p, to be 
the system obtained from S by replacing pre by p in the CHC C € S, where 
rel(body(C)) = {pre} 

Lemma 2. Given S with pre and its extension Sy, if M, is a solution to Sy, 
then M = {Ar E€ R. ifr = pre then y else M,(r)} is a solution to S. 


To encode program executions, we borrow the notion of CHC unrolling 

from |21|. Essentially, a CHC unrolling is a symbolic representation of a set 
of program executions starting from a state satisfying y. If the unrolling is sat- 
isfiable then the execution terminates in the postcondition. 
Definition 7 (Unrolling of CHCs). Given an extended CHC system Sy over 
R, let Co,...,Cy be ak + 1-length sequence of CHCs in Sy, with Co being a 
fact, Cy being a query, and rel(head(C;)) = rel(body(Ci41)) for each i. Then, a 
k-length unrolling of Sọ is defined as below: 


TOs, = VAN body (Ci) (Xi, ti41) A (body (Cr )[mp/P]) (£k) 
0<i<k 


Example 2. Consider the CHC system S' from Fig.|3} Let p be: 
(Vj-0<j<N=> Al] = BU] = 0A CH] =1)AN=1 


Then 7¢,,¢2,C,); Which is a 3-length unrolling of Sy, is the following satisfiable 
formula: 
(Cy ,C2,C) = (Vj 0<j7<N Aly] Bij] OAC] =1)AN=1Ai=0A 
i<NA2*i<NAC'= store(C,i,i) Ai’ =i+1 A 
<N) A (V5.0 < j < N => Alj]=BUy)). 


Our technique addresses deterministic programs. A non-deterministic pro- 

gram in our context has an initial state that can both satisfy and violate the 
postcondition. More formally, 
Definition 8 (Non-deterministic Modulo Postcondition CHCs). Let S 
be a system of CHCs that has pre and extendable by a formula p. We call 
S non-deterministic modulo postcondition if there exists an uniquely satisfiable 
formula s for which there are at least two satisfiable unrollings Tico... c} and 
F(Co,...,Cm) corresponding to extensions Ss and Ss, respectively. Otherwise, we 
say S is deterministiq®| 


€ An example is presented in [49]. 
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We assume that the CHCs are representing terminating programs. Hence, for 
any initial state of a program encoded in CHCs, there exists an unrolling either 
satisfying or violating the postcondition. 


Definition 9 (Terminating CHCs). Let S be a system of CHCs with pre 
and extendable by a formula p. We say S is terminating if there does not exist 
an infinite-length unrolling for St and Sy (i.e. S and S are extended by p = T). 


3.3 Linear Array Loop Programs 


Though our algorithms work at the level of CHCs, we are motivated to target 
CHCs representing linear array loop programs (or “linear loops” in short) that 
model real-world programs in existing array program verification works |9J39[10}. 
These are terminating programs with non-nested loops. We now present the 
syntax of a linear loop. 


program — assume(pre(V, A)); stmts; post; 
stmts — assign | forloop | stmts; stmts 
assign —> v = f(V,A) | afi] = f(V,A) | if((V)) {assign} else {assign} 
| assign; assign 
forloop — for (i = L(V); c(i, V); i = h(i)) {assign} 
post —> assert(Vx. R(x, V) => Q(z, V, A)) | assert(3x. R(x, V) A Q(z, V, A)) 


Here V and A are disjoint sets of integer and array variables, respectively, 
i € V is a loop counter, v Æ i € V is an integer variable, f is a term over 
V and A such that any access to A is done by i, l is an integer term over 
V, h is an integer term over 7 which results in a monotonically increasing (or 
decreasing) assignment, and c is a guard of the form i < u or i > u for some 
integer term u over V, and ¢ is a boolean predicate. The postcondition p is given 
as a condition in assert, where R is a predicate in LIA over quantified and 
integer variables that represent a range of array elements, and Q is a property 
over an array with array read-access done only by x. For example, the formula 
Yz.0< x< N = > Bia] = 42 is in this form, where B is an array variable. 

The precondition (and inductive invariants) inferred by our algorithm will 
be of the same quantification as the postcondition. Further, it can be conjunc- 
tions in case of universal quantification and disjunctions in case of existential 
quantification. Specifically, we consider preconditions (and inductive invariants) 
of the form described in (4). Such a form has been found effective in inferring 
inductive invariants in the existing works for array programs like [88)33)32[22). 


N (vz. R(x, V) => Q(z, V,A)) or V (ax. R(x, V) A Q(z, V, A)) (4) 


A formal description of CHCs that represent linear loop programs is given in 


Sect. 
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4 Undecidability of Maximal Precondition Inference for 
Linear Loops 


Although linear loops and postconditions have syntactic restrictions, inference 
of maximal preconditions for such programs in the considered form (i.e. (4) is 
still undecidable. In this section we prove this result. 

We reduce the halting problem of two-counter machines [42] to the maxi- 
mal precondition inference problem. Recall that a two counter machine M = 
(C1, C2, L) has two counters C4 and C2, which are initially set to 0, and a finite 
set of instructions L = {l1,...,l,}, where each instruction l; is of type inc, 
decjz, and a designated halt instruction J, . Given a two-counter machine 
M = (Ci, C2,L), deciding whether it halts, i.e. the halt instruction I, € L is 
reached, is undecidable. 


Theorem 1. The problem of computing the maximal precondition for linear ar- 
ray loop programs in the form described in (4) is uncomputable. 


Proof Sketch We construct a linear array loop program with a single loop whose 
body simulates the execution of one transition of a two-counter machine, and an 
array records the locations the machine can reach after the transition |"] 

The undecidability of the problem notwithstanding, many real-life programs, 
like industrial battery controllers [9], adhere to linear array loop structures. Con- 
sequently, techniques like have been developed to address such programs, 
but focusing on assertion checking rather than precondition inference. The ex- 
isting precondition inference technique [54] finds it challenging to infer a precon- 
dition for such programs (details in Section [S}. Motivated by these challenges, 
we propose a sound technique that infers maximal preconditions. 


5 Inferring Preconditions and Invariants by Abduction 


In this section, we give an overview of our approach for abductive inference of 
preconditions and inductive invariants. We first explain its basic principles, and 
then demonstrate them on the running example. 


5.1 Overview 


We assume that the input system of CHCs S represents a precondition inference 
problem, i.e. it has no facts, a single query, and a designated relation (pre or 
cpre) for the precondition. Since we are interested in the precondition inference 
for array programs, we assume that the query has a quantified constraint p. 
The high-level algorithm is given in Algorithm [I] It is called INFERABD and 
is inspired by an earlier work on specification synthesis [50]. INFERABD incre- 
mentally attempts to discover an interpretation for each uninterpreted predicate 


7 All proofs are in [49]. 
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Algorithm 1: INFERABD(S, M, R) 


Input: S — set of CHCs over R, R C R — current subset (initially empty) of 
relations with invariants/preconditions, M — mapping from R, to 
predicates, initially Ar. T 

Output: M — invariants/preconditions of S 


if R= Ø then 
R + {r | As. rel(body(s)) = r A rel(head(s)) = LAs € S} 

Worklist ~ {C | C € SA rel(body(C)) € R}; 

while 3C € Worklist. CHECKSAT(=(body(C) => head(C))[M/R]) do 
let p be (body(C) => head(C)) [M/R]; 
M (rel(body(C))) + M(rel(body(C)))A ABDUCE(Y, args(body(C)), S); 
M (rel(body(C'))) ~HoupInI(S, M, R); 

if No M(-) was strengthened or weakened then 
if R= R then return M; 

R + {r | IC € S. rel(body(C)) = r A rel(head(C)) € R}; 

return INFERABD(S, M, R); 


ooo NOAA KR WN BR 


m m 
e O 


in R by propagating the assertion backward, strengthening it when needed to es- 
tablish inductiveness, or weakening if something went wrong during the inference 
of inductive invariants. 

INFERABD (Algorithm |1) constructs a solution M for a system of CHCs 
recursively. M initially maps all the predicates in R, to T. At each call, the 
algorithm searches for a CHC C (line |3) such that M is not inductive at C. 
This inductiveness check is reduced to a satisfiability check, which is performed 
by an SMT solver (line (4p. If ọ is satisfiable then M is not inductive at the 
corresponding C, and thus M needs strengthening. 

Note that in the first call of INFERABD, the initial M is inductive for all 
the CHCs except the query, thus the interpretations will be created for the 
predicates that appear in the body of the query. In the subsequent calls, these 
interpretations could be either strengthened or propagated through the bodies 
of the CHCs where they appear in the heads, towards the precondition. 

In INFERABD, we write ọ <- ABDUCE(y, 7, S) to denote an invocation of a 
new abduction algorithm (Algorithm [2} to obtain a formula ~ over variables 7 
that makes ọ valid. INFERABD uses ABDUCE as existing abduction solvers have 
limited support for arrays. In order to support arrays and quantifiers, ABDUCE 
abstracts quantified formulas over arrays and integers into quantifier-free formu- 
las only over integers. To do this, ABDUCE considers two abduction queries for 
a CHC in S: 1) for the array element that is being rewritten (if any), and 2) 
for all other elements that are not changed. The formal description of ABDUCE 
is in Sect [6] along with illustration. However, by doing this “arrays-to-integer” 
reduction, ABDUCE could introduce some imprecision, which is fixed by running 
the HOUDINI algorithm (details in Sect (6.3). 

INFERABD may not terminate because the series of strengthening predicates 
obtained in each iteration may diverge. But the recursion in INFERABD can be 
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easily augmented by a threshold condition that forces the termination with an 
UNKNOWN result after reaching a predetermined recursion depth. 


Theorem 2. Whenever Algorithm [1] terminates, it returns a solution M to S. 


5.2 Approach in Action 


We demonstrate the precondition inference approach on the example from Sect. 
and Fig. 


Synthesizing an invariant for inv The algorithm begins with obtaining an initial 
candidate interpretation to inv from the query CHC. The predicate is the query 
constraint (i.e. the postcondition =p) with a slight modification: 


inv Xi,N,A,B,C.Vj(0<j<N Aj <i = Afļj]=B[j]). 


The modification includes dropping the loop condition and strengthening it 
by conjuncting a range formula [21] to the antecedent (j < i here). In simple 
terms, the range formula is a predicate that represents the boundary between 
indices that are modified and not modified. It can be (j < i) or (j > i), based 
on whether the loop counter is increasing or decreasing, respectively (formal 
definition in ma}. 

Our algorithm then checks if any of the CHCs in Worklist are not valid. 
In this case, the second CHC is not valid. The algorithm then follows backward 
reasoning and attempts to update the current interpretation of inv by abductive 
strengthening to make it inductive using a series of SMT checks and quantifier 
elimination queries. 

The algorithm does abductive strengthening by posing two queries. The first 
one is to accommodate the write to the i-th element of the array. This strengthen- 
ing for the second CHC is posed as an abduction query for pı that is constructed 
by restricting to only a single cell of the array that is rewritten in the loop: 


W1(A, B,C, j) A Aj] =A] A BUG] = BU] A CL] = i Aj] =B'[)). 


Here, all the array terms (like A[j], A’[j], Bly], etc.) are further replaced by fresh 
integer variables which allows us to use a standard abduction solver and get the 
following solution: 


Intuitively, ~ gives the weakest precondition on Alt], Blt] and Cli] before 
the i-th loop iteration, such that the desired postcondition holds for A’[i] and 
B’ ji] after the iteration. 

The second abduction query accommodates all the other elements in the 
range 0<j<NAjJj i that are unaffected in the i-th iteration: 


$2(A, B,C, j) A AU] =Al] A CG] = CI] A BU] = Bij] A'[j]= B'[j]. 
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The delta w.r.t. the first query is shown in bold. This query also has the same 
solution as 71. 


To build the new invariant from %2 and Yı to the new invariant candidate, 
we split the array range into two segments based on the range formula, and its 
negation: 


inv NA, B,C,i.Vj(0<j<NAj <i = Alj]=BUj]) A 
Vi(O<J<NAJ <in? x j<N= Aļj]=B[j]) A 
VI(OSÍ<N Aj > in? x j<N= Alj]=Bly)). 


The second conjunct is derived from %2 and the range formula (j < i), 
whereas the third conjunct is from 7, and negation of the range formula (j > i). 
If the CHC has any additional constraints (like 2 xi < N here) that will be 
added in the antecedent as well. 

While validating this candidate, the algorithm goes over the CHCs again and 
checks the implications: it now turns out to be not inductive for the third CHC. 
The algorithm thus repeats the abductive strengthening and poses two queries: 


$3(A, B,C, j) A B'Y]=BU] ACU] =CU)] A AGI = Cl] A'[j]=B' [j], 
Yal A, B,C, j) A BY] =BU] A Cl] =Cl] A A'l] = Af] A'[j]=B'{j], 


inv > àA, B,C, i.Yj(0<j<N Aj <i => Afjl=Bf[j])^ 
Vi(OSI<NAJ < in2 x j<N= Ali] =B[j]) A 
VISÍ <N A j > i^2x j <N= Aļj]=B[j]) ^ 
VI(O<G<NAG <in? x j> N= > Aly]=Blj]) A 
VI(O<G<NAG>IA2* j> N= B|j]=C[j]). 


Synthesizing pre Finally, the precondition is obtained from the solution for inv. 
Because the first CHC initializes the counter 7 to zero, all the conjuncts with 
j < i simplify to true and the rest simplifies to: 


pres AN, A,B, C.Vj(0<j<NA2%j < N => Alj]=Blj]) A 
Vi(O<j<NA2*j>N = Bij]=Chj)). 


6 Range Abduction 


In this section, we present our technique called range abduction for inferring 
quantified invariants, and subsequently, quantified preconditions. We define the 
ABDUCE method for quantified formulas over arrays and linear arithmetic that 
can be used in the general algorithm of abductive invariant synthesis. Its core 
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features include the capability to selectively apply quantifier elimination, such 
that it keeps all quantifiers that are explicit in the abducible formula. As its main 
computational vehicle, the method uses quantifier elimination over linear arith- 
metic on formulas produced from the actual abducibles by over-approximating 
(as precisely as possible) the array computation. 


6.1 Preliminaries 


CHCs We first formally describe the CHC structure that we support correspond- 
ing to linear loops. We assume that the inputs are given as CHCs, where bodies 
are in CNF (otherwise, it can be transformed following Sect. [3.2). For each CHC, 
we consider two disjoint vectors of source (resp., destination) variables, y and @ 
(resp., U’ and @’), such that only @ (resp., @’) consists of array variables. 

We allow only a single index to access elements of all arrays b € @in each CHC 
C, and without loss of generality we assume that it is an integer variable i € v 
(usually, a loop counter) [5] For simplicity, we also introduce a set of temporary 
integer variables t that store some elements selected from arrays and can be 
used in other parts of C (e.g., to compute the next value to be written to an 
array b' via some function f). Thus, we assume that only three possible types of 
constraints are used to equate arrays (or their elements), and that they appear 
in recursive CHCs, that is: 


inv, (0, a) A [(a’ = aA)*] A [t = afiJA)*]A 
[(b’ = store(b, i, f(T, ÐA] A p(t, 0", t) => inve(v", a’) 


where * is Kleene star, a,b € @, a',b' € @’, t € Ë, and ọ is over only non-array 
variables. Note that sequences of stores (e.g., nested) could be supported after 
some sort of a CHC normalization, e.g., by introducing temporary uninterpreted 
predicates and splitting C. Symbols inv, and inva might refer to the same 
predicate. 


Queries There is only a single query among CHCs, and it has the form of either 
of the two implications: 


)= L1 (6) 
,U,d)) => 1 (7) 


In the body of the query, there is a quantifier-free conjunct y and a quantified 
formula with subformulas R and Q. Formula y could represent the termination 
condition of the array processing loop/recursion (captured in the other CHCs). 
The subformulas R and Q could represent, respectively, a range of elements in 


8 In practice, the restrictions about array accesses and the shape of the CHC can 
be relaxed, but requires a more careful handling than we propose in this paper. 
Our implementation has it, but the paper omits it to maintain the simplicity of 
presentation. 
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an array (giving a condition over possible index x of the array), and a property 
over an array element (indexed using the x variable). We restrict read-accesses 
of arrays to the quantified variable only. 

The formula in the query determines an initial candidate interpretation for 
the predicate in the query. For instance, in (6) and (7), respectively: 


inv Nb, a. Ve. (R(x,0) => Q(z, v, @)) (8) 
inv AT, a. 3x. (R(x, 8) A -Q(a, ©, a) (9) 


Applying Algorithm|I| We assume that an iteration of the algorithm deals with a 
mapping M and the following CHC, where M (inv) might be currently T, but 
M (inv) is quantified: 


inv, (ð, a) A p(%,a,v',a@’) => inve(t, a) (10) 


Abductive strengthening is needed when the following implication is not valid 
on substitutions of interpretations of inv, and inve (line|6]of Algorithm|1), thus 
necessitating to find Y, such that the following is valid: 


Php = M(inv2) (11) 


Intuitively, for Y our algorithm reuses the quantified structure of M(inv2). 
For all quantifier-free conjuncts of M(inv2), strengthening is done following the 
simple abduction, like e.g., in [I7]. For quantified formulas, the algorithm is 
trickier. In the rest of this section, we assume that algorithms are strengthening 
w.r.t. formulas 73 and my having the forms, respectively (9 and (8). 


6.2 Core Technique 


The basic principle behind our quantified abductive strengthening is in the 
preservation of the range. That is, if the quantified formula on the right side 
of has form (9p or (8), then it intuitively means that some property Q(z, v, @) 
should hold either for all elements of array(s) (when quantification is universal), 
or some elements of arrays @ (when quantification is existential), determined by 
R(«x,v). Thus, an interpretation of predicate W on the left side of should 
also constrain all elements of (some) arrays belonging to the same range. 

Since by our syntax restrictions we allow elements of arrays b € @ to be 
rewritten using only a single index i, each constraint b = store(b, i, f (U, t)) can 
be safely replaced in the CHC body as: 


b'li =f, and  Yj.i#j = b'[i] = bļi] 
In the following, we are going to use the auxiliary mapping to reduce abduc- 
tion over array and integer variables to purely integer abduction. 


Definition 10. Let a, t, and?’ be sets of array variables, integer variables, and 
integer terms, respectively, all of the same cardinality. A bijection SS : ? t is 
called select-substitution w.r.t. index i, if for every a € G, there exists t € È such 
that SS(ali]) = t. 
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Algorithm 2: ABDUCE(z, £, S) 

Input: 7 — abducible formula of the form (11) from a CHC C, 7 — variables 
to keep, S — set of CHCs over R, where C € S, M — mapping from R, 
to predicates 

Output: 7 a strengthening for 7 


1 (1,72) < decompose 7 into and (13); 
2 for k € [1,2] do 
3 Tk < unquantify and apply some SS to Tk; 
4 wy, +— solve integer abduction for Tk; 
5 wr < apply SST} to wi, and replace i by a; 
6 o + COMPUTERANGEFORMULA(S); 
7 0 < GETCONDITION(C); 
8 if m universally quantified then 
9 pı + Yı Vo V 0; 
10 we — Y2 V no V 78; 
11 we Yz. Yı A Yz. we; 
12 if m existentially quantified then 
13 Ypi = Y1 Ano A O; 
14 we 4 Y2 Ao AO; 
15 we Jr. yı V Aa. p2; 
16 return W; 


The pseudocode of our range abduction is given in Algorithm |2| Below we 
discuss its details. 


Universally-quantified formulas The abduction query z of the form can 
be decomposed (line|1) into two stronger abduction queries, 71, 72: 


pia [Oi = FHAN]. => (RG, v) = Ql, 7, 4) (12) 


pa^ [0i] = blia)*]... = (RG, 5) = Qli, v, a) (13) 


Since M (inv2) is universally-quantified and due to our syntactic restrictions, 
only the i-th elements of any source arrays are relevant for the abduction query. 
Thus, without loss of generality, our algorithm lowers the (possibly) universally- 
quantified formula in M(inv2) to a quantifier-free formula over the i-th array 
element, and further replaces all the array access terms of the form afi] to integer 
terms a; using a select-substitution SS, essentially boiling down to two abduction 
queries over pure integer arithmetic with abducibles %1 and %5 (lines [3] (4p. 

After the abduction solver returns y) and w for the integer arithmetic 
queries, the ss~! mapping is applied to replace integer terms a; by array terms 
ali] to get yı and p2 that constitute solutions to queries and (13) (line 5). 

It remains finally to re-introduce the universal quantifier for x to yı [x/i] and 
wW2[x/i] to get a solution to our main abduction query (i). There are several 
ways to do it. One way is to not introduce quantifiers for pı as the query 
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captures the effect of a single store to an i-th element of an array. For wo, then, 
the quantifier’s range will span over all the original range except i. However, 
this way, seemingly obvious, does not work in practice because the produced 
invariant is unlikely to be inductive. 

Another way is to split the range into two segments with the border at i. It 
would intuitively correspond to the range formula computation of [22], i.e., the 
sub-array that has already been processed in the loop encoded by the CHC, and 
the sub-array that remains to be processed. The former restricts the range of ~2 


(lines [10] and the latter of Yı (lines [9] [13). More formally: 


Definition 11. For an inductive CHC C with loop counter i, where i is in the 
interval [l,u], and a free variable j, the range formula is j < i when i > lis 
inductive at C, and j >i when i < u is inductive at C. 


In Algorithm |2| o is the range formula returned by COMPUTERANGEFOR- 
MULA. Additionally, GETCONDITION adds predicates that are present in the 
constraint of the CHC (like 2 xi < N) after substituting the loop counters in 
them by the quantified variables. 


Existentially-quantified formulas (9) Similar to the universally-quantified case, 
the abduction query for existential quantification will be decomposed into 
two abduction queries. Queries and in this case have the form: 


The remainder of the algorithm in this case is the same as in the universally- 
quantified case with the exception that we disjoin two quantified solutions for 
the abduction queries before checking if it is inductive. 


6.3 Houdini Algorithm 


The strengthening performed by Algorithm [2] might result in a too strong candi- 
date invariant for already validated CHCs. To resolve this, Algorithm [1] weakens 
the candidate invariants by using an existing algorithm called HOUDINI [24] (line[7). 
Given a set of relations R and a mapping M, HOUDINI recursively weakens M un- 
til it is inductive at each CHC C whose rel(head(C)) € R. It does this by finding 
a counterexample to inductiveness and dropping the conjuncts that don’t satisfy 
the counterexample. 


6.4 Illustration of Existentially Quantified Precondition Inference 


We end this section by illustrating Algorithm [I] on an existentially quantified 
postcondition from Fig. |4| The CHCs of this program are given in Fig. 

The algorithm chooses an initial candidate for inv from the query. The loop 
condition is dropped like universal quantification, but the range formula is not 
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cpre(N, A, B,C) Ai = 0 => inv(i,N,A,B,C) 
inv(i,N,A,B,C) M< NA2*i<NA C'= store(C,i,i) Ai’ =i+1 => inv(i',N,A,B,C’) 
inv(i,N,A,B,C) ii< NA2*i>NA A'= store(A,i, Cli) Ai’ =i+1 => inv(i’,N,A’,B,C) 
inv(i,N,A,B,C) Anli < N) A ~(Ej.0 < j < NAALJABL]) = L 


Fig. 5: CHC encoding of program in Fig. 


conjuncted for existential postcondition as this often results in a too strong 
precondition, viz. L. 


inv Ai, N,A,B,C.3j.0<j <N A Alj| ABU] 

Algorithm [I] now checks if either the second or third CHC in the Worklist 
is not inductive. Since the third CHC is not inductive, ABDUCE is called. The 
result of two abduction queries corresponding to i-th element and non i-th ele- 
ment, i.e. yı and %2, will be Bly] #4 C[j] and Aly] Æ Bly], respectively. Further, 
quantification and range formulas are added, which will result in the candidate: 


inv? Ni,N,A,B,C. 3j.0<j<N A AL]ABLIJA 
(Aj.0<j<NAZ>iA2*J > NA BYFCLIV 


j.0<J<NAj<iA2*j>NAALG]ABL) 


Now, the HOUDINI algorithm finds that the candidate is not inductive at the 
third CHC. For instance, it finds a counterexample to validity of the form: 


alj] 4 blj] for j = i , otherwise a[j] = b[y] 


b[j] # cly] for j = i + 2 , otherwise b[j] = cfi] 


It drops the conjunct 47.0 < j < N A Alj] Æ Bly] that does not satisfy the 
counterexample. The rest are found to be inductive at the third and second 
CHCs. 


inv > X,N,A,B,C.3j.0<j7<NAJ>iA2*j > NA Bljl#Cljlv 


jO<Sj<NAj <in? j> NA AFB] 


Finally, the precondition cpre is computed from the first CHC by substitut- 
ing i = 0, resulting in: 


cpre++ Ai, N,A,B,C.3j.O<Sj<N A2 xj > NAB|jl#Clj] 


7 Maximal Preconditions 


The interpretation of pre generated by Algorithm [I] is guaranteed to be a pre- 
condition by Theorem [2| but it could be non-maximal. That is, it may exclude 
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Algorithm 3: MAXIMALPRECOND(S, pre) 


Input: S — set of CHCs over R, pre E R — precondition relation 
Output: (pre) — Maximal precondition for pre 


1 M < INFERABD(S, {Ar E€ R. T}); 
2 M + INrERABD(S, {Ar € R. T}); 
3 ¢6+--(=M(pre) => M(cpre)); 
4 while CHECKSAT(¢) do 

5 ctm < GETMODEL(4); 


// ctm is of the form A. ti= ci 
O<i<n 


6 postV iolated + UNROLLCHC(Scim, Sctm); 
7 if postViolated then 
a 2 
9 


M + WEAKEN(S, M(cpre) V ctm) 
else 
10 M <— WEAKEN(S, M (pre) V ctm) 
11 b+ 7(>M (pre) => M(cpre)); 
12 return M (pre); 


some initial states from which the postcondition holds. In this section, we pro- 
pose a technique that checks whether a precondition is maximal (i.e. logically 
weakest). If not, it incrementally weakens the precondition in a loop until it 
becomes maximal. 


7.1 Overview 


Algorithm [8] gives a description of the maximality checker. Given a precondition 
inference problem via a system of CHCs S, it returns a maximal precondition 
on termination. It first generates a precondition for S using Algorithm |1| In 
order to check whether the precondition is maximal, the algorithm infers another 
precondition for the complement CHC system 5 (linef). Recall from Definition[5] 
that this system has the same structure as S except the postcondition in the 
query is complemented. To avoid confusion, we consider pre of this system is 
substituted by another uninterpreted relation with the same arity cpre. For 
example, Fig [5]is the complement CHC system of Fig [8] 

The maximality check is performed next by checking whether all the states 
that are outside M(pre) are in mt (epre)(line 4. Intuitively, if all the states in 
3M (pre) are in M (epre) then those states violate the postcondition as M(cpre) 
is the precondition of the complement postcondition. The validity check is re- 
duced to a satisfiability check by negation and the model to the satisfiability 
check is called a countererample-to-maximality, or CTM. 

The algorithm uses the CTM to determine which of pre or cpre has to 
be weakened by invoking the method UNROLLCHC (line (6). Intuitively, UN- 
ROLLCHC performs a task similar to executing the program represented by 
CHCs with CTM as the initial state. More precisely, UNROLLCHC will find 
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unrollings (Definition of different lengths for the extensions Setm and Setm 
and terminates when an unrolling is satisfiable. It then returns whether the un- 
rolling was from Sczm, Or Setm. For a deterministic CHC system (Definition is}, 
a satisfiable unrolling exists either for Setm, or ‘Soar 

In the next step, the algorithm will weaken cpre if the unrolling is from Setm, 
or pre if the unrolling is from Setm. The weakening is performed by WEAKEN, 
which will be called with an appropriate CHC system and the current interpreta- 
tion for the precondition (lines [10] [8). WEAKEN will generalize the precondition 
and find inductive invariants. This loop of checking for CTM and weakening one 
of the precondition continues till the maximal precondition is found. 


Theorem 3. The precondition returned by Algorithm[3| when it terminates, is 
maximal when S is deterministic and terminating. 


Example 3. In Sect [5.2]and Sect Algorithm [I] found the following interpre- 
tations for pre and cpre: 


pre œ AN, A, B,C.Yj.O<j<N A2 x* j< N = Afi] = Bj] 
Vji.0<j<NA2«7 > N= Bij]=Cl]. 


cpre +> ài, N,A,B,C.3j.O<S<j <N A2 *j > NA B|jl#Clj]. 


The reader may notice that cpre is not maximal, hence it is not possible to 
check whether pre is maximal. We now illustrate how Algorithm [3] determines 
this. 


After finding the interpretations, Algorithm [3] checks the following formula: 


A(Vj.0<9<NA2*5<N => Alji] = BIj AYj.O0<j<NA2* j >N => Bij]=Chi) 
= 
5j.0<j<NA2«j>N => BUJACH] 


Since this formula is satisfiable, the algorithm deduces that at least one 
among M(pre) and M.(cpre) is not maximal. Suppose it gets the following 
satisfiability model, or CTM: 


N = 1 ^ A[0] = 0A B[0] = 1A C[0] = 0. 
UNROLLCHC finds that the CHCs violate the property when the CTM is 


the initial state. Hence, cpre, the precondition of negation of the property, can 
be weakened by at least one point, viz. CTM. 
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Algorithm 4: WEAKEN(S, M(pre) V ctm) 


Input: S — set of CHCs over R, M(pre) V ctm 
Output: M’ — a solution to S with M(pre) V ctm => M'(pre) 


1 G 4+ CONSTRUCTGRAMMAR(S, M(pre)); 
2 while T do 
3 o + NEXTCANDIDATE(G); 
if CHECKSAT(=(ctm => o)) then continue; 
p + M(pre) Vo; 
for i € [0---n] where R = {ro = pre,r1---Tn} do 
M'(ri) < INVINFER(S, M’,r;) or ọ for ro; 
if IC € Sp. CHECKSAT(=(body(C) = head(C))[M'/R]) then 
continue; 
9 return M’; 


(e o eS, 


7.2 Weakening of Precondition 


Once the precondition that has to be weakened is determined, a trivial weakening 
is to add the CTM to the current interpretation. However, this may cause non- 
termination as there can be infinitely many CTMs. In this section, we propose 
a heuristic in Algorithm [4] that can accelerate the weakening process. 

Algorithm [4] works in two stages. First, it finds a formula y that is generally 
weaker than the trivial solution M(pre) V ctm (lines|3} 5}. To do this, it enumer- 
ates (line 3) a formula o from an input grammar G (a sample grammar is given 
in ) and then checks if it is weaker. Then, it finds inductive invariants M’ 
(line[7) for the extended system S, (recall Definition|6) using a slightly modified 
version of range abduction (algorithmic description is in [49]). By Lemma P] p 
and M’ together forms a solution to the input system S. 


Theorem 4. Algorithm|4| returns a solution M' to S, and M(pre)V ctm => 
M' (pre) 


Example 4. We continue illustration of Example [8| Algorithm [4] is called with 
a complement CHC system (Fig |) and M(cpre) +> Xi,N,A,B,C.Aj.0 < j < 
NA2*j7>NA BY] AClj] and ctm = N =1A A[0] =0A B[0] = 1A C|j] = 0. 
Suppose that the algorithm samples ø as 4j.0<j<NA2*j<NA A[j] ABI] 
based on the constraints from query and second CHC. Since the check at line 
passes, p will be assigned: 


Wy.0<j7<NA2*I SNA BYACY]VAO<SI<NA2*57< NA A]ABL]. 


INVINFER uses the postcondition to compute M[ri+ı] and o to compute 
M(r;—1]. It then adds j < i to the former, j > i to the latter, and disjuncts them 
(due to existential quantification) to get: 


inv 4 \i,N,A,B,C.3j.0 < j < i A ALi] # B[j]v 
jis G<NA2*j>NABUIJFCHlV 


j.i<jG<NA2*j7<NAAj]FBL] 
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Since this is inductive at all CHCs, the algorithm returns with y and inv. 
Algorithm [3] will perform its check and finds that pre is maximal. 


8 Evaluation 


Tool We implemented our algorithms in a tool called PREQSYN on top of the 
FREQHOoRN framework [22]. Our tool takes as input a precondition-inference 
problem encoded as a set of CHCs. It uses Z3 to solve SMT queries. Quan- 
tifier elimination is performed using the solver from [20] that uses model-based 
projection [5]. On a successful execution, our tool infers maximal preconditions 
and inductive invariants for the loops. 


Evaluation Goals We evaluate PREQSYN on the following research questions: 


RQ1 Can PREQSYN infer universal and existential preconditions? How many 
of them can it prove maximal? 

RQ2 Can PREQSYN compete with existing maximal quantified precondition 
inference tools? 

RQ3 How challenging for state-of-the-art is to infer invariants with precondi- 
tions? 

RQ4 How do various modules of PREQSYN influence its performance? 


Benchmarks and Configuration We use 32 precondition inference problems with 
29 universal and 3 existential quantified postconditions. Since none of the bench- 
marks from [54| had quantified postconditions, we derived a majority (26/32) 
of benchmarks from the existing verification benchmarks of [22] that have been 
collected from various sources like SV-COMP. In particular, we considered 48 
benchmarks from the public repository of that have multiple loops, i.e., the 
first loop has an array initialization, and the other loops involve various types 
of array processing like copying, modifying, filtering, and searching among the 
elements. We then excluded the first (initialization) loop from each benchmark, 
thus targeting the necessity of synthesizing a quantified precondition that would 
intuitively describe how the arrays need to be initialized in order to meet the 
postcondition. We further excluded benchmarks that gave repetitive problems 
(8/48) and did not meet our syntactic restrictions (viz. had non-quantified post- 
conditions (6/48), had nested loops (5/48), or had non-linear expressions (3/48)). 
We added 6 new benchmarks to test different features of our tool. 

We performed the experiments on an Ubuntu 20.04 machine with a 2.5 GHz 
processor and 16GB memory. A timeout of 100s was given to all the tools. 


RQ1 PREQSYN inferred a precondition for 31/32 benchmarks. The failed bench- 
mark timed out in the inductiveness check. Out of 31 preconditions, 22 were 
proved to be maximal automatically. All the successful benchmarks were com- 
pleted within 5 seconds. Overall, PREQSYN solved CHC tasks numbering 31 
with universally quantified and 30 existentially quantified postconditions corre- 
sponding to pre and cpre. 
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On manual inspection of 9 benchmarks for which PREQSYN found a precon- 
dition but was unable to prove maximality, 5 were found to be non-deterministic. 
However, the inferred preconditions for them were sufficiently weak. The rest 4 
failed in different stages of weakening cpre. Among these benchmarks, we found 
that 3/4 preconditions (i.e. pre) were actually maximal)? 


RQ2 We ran P-GEN (with Z3 v2.0 as its SMT solver) on semantically equiva- 
lent C programs manually constructed from the CHCs. P-GEN found only 2/32 
preconditions as maximal. Both of them were existentially quantified. It timed 
out on 5/32 benchmarks. On the remaining 25/32 benchmarks it exited without 
finding a precondition. Overall, PREQSYN inferred significantly more precondi- 
tions than P-GEN due to the generalization capability of range abduction. 


RQ3 We tried to replace our invariant inference technique by an existing one, 
thus evaluating the need to discover our invariants. Existing state-of-the-art 
CHC solving tools can handle arrays, to some extent, namely: SPACER [32](Z3 
v4.8.10), a PDR-based invariant inference tool, and FREQHORN [22] (v.0.6), a 
SyGuS based invariant inference tool. So we pose the simpler problem of in- 
ferring invariants with preconditions to them. Furthermore, we also pose this 
as an assertion checking problem to VERIABS (v1.4.2), a portfolio solver 
that targets linear loops and the gold winner of SV-COMP 2022 ReachSafety 
Category [4] and the winner of array category since several years. 

To create invariant inference and verification problems, we consider 42 pre- 
condition inference problems corresponding to pre and cpre for which PRE- 
QSyn was able to find the maximal preconditions. The 42 precondition infer- 
ence problems were converted manually to verification problems by using the 
maximal preconditions. For SPACER, the CHCs were annotated by the maximal 
interpretations of pre and cpre, for VERIABS, semantically equivalent C pro- 
grams with maximal preconditions as loops, and for FREQHORN original CHCs 
were provided as input. 

Out of 42 problems, 21 each of universally and existentially quantified post- 
conditions, VERIABS solved 37, FREQHORN solved 20, and SPACER solved 11. 


RQ4 We disabled HOUDINI algorithm from line[7]of Algorithm [I]and PREQSYN 
found preconditions for 27 benchmarks compared to 31 with the range abduction 
algorithm. Out of 27, only 6 were proved maximal. We conclude that weakening 
by HOUDINI is useful, especially when postconditions are existentially quantified. 
We extended the SMT-based maximality checking algorithm from [50], but it was 
unsuccessful in proving the 21 problems that our maximality checker proved. 


9 Related Work 


The problem of precondition inference appears in multiple applications and has 
been the subject of numerous works. Broadly, these works can be classified as 


? Detailed results of evaluation with timings can be found in [49]. 
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static [45]14)54]13], dynamic [53/25]3]41), and a mix of both [46]. Our technique 
falls in the first category. The two works closest to ours are [I4] and [54] which 
compute maximal quantified preconditions for array programs, using abstract 
interpretation and CEGAR-based predicate abstraction, respectively. Unlike the 
technique in {14}, our work does not require predefined abstract domains. The 
technique in computes over-approximations of safe and unsafe states (i.e. 
over-approximations of pre and cpre) and then refines them till they become 
disjoint. The over-approximations are computed using predicate abstraction and 
the predicates required for the refinement of the abstraction are derived from 
a set of heuristic rules. Our technique differs from theirs in several ways: we 
rely on abduction-based techniques to infer necessary predicates, while they rely 
on minimal unsat cores; we infer quantified inductive invariants that witness 
the correctness of the inferred preconditions, while their technique does not; 
finally, we target quantified postconditions while they consider only quantifier- 
free postconditions. 


The problem of inferring universally quantified inductive invariants has re- 
ceived considerable attention. The inference is made using methods such as ab- 
stract interpretation [80], predicate abstraction using Skolem constants [40] and 
interpolation , an extension of IC3 for arrays [32], and syntax-guided synthe- 
sis [22]. These techniques, apart from being restricted to universal quantification, 
also expect a precondition. Our technique overcomes these limitations by infer- 
ring preconditions including existentially quantified ones. 


Many techniques verify programs with arrays by transforming them to a 
sound abstraction without explicitly generating inductive invariants. The ab- 
straction can be obtained by considering all the array elements as a single cell [7], 
or multiple fixed cells and then converting to array-free nonlinear CHCs [43], 
overapproximating unknown loop bounds to a smaller known bound [39], acceler- 
ating entire transition relations [8], using CHC transformations and induc- 
tion based techniques [DOMI]. The portfolio solver VERIABs [I] used in our ex- 
periment predominantly used the shrinking [89] technique to verify, which does 
not generate invariants. The tool also has induction-based techniques [9/10/11 
that implicitly generate invariants, but are not given to the user. RAPID 
translates the semantics of the input program into formulas in trace logic. Then 
the formulas are verified using a theorem prover. Though sound lemmas are used 
to translate loops, it currently does not support the extraction of invariants from 
the lemmas. Apart from the inability to generate explicit invariants, all of these 
techniques need preconditions to verify the programs. 


Our technique works on CHCs, which has gained much attention in recent 
years for different verification and inference tasks [57)36/51)21/19[50[32/22]. Most 
of these techniques do not handle arrays, and when they do, do not generate 
maximal preconditions. 

The core part of our algorithm uses abductive inference. Abduction has been 
used for programs without arrays to infer invariants [[7]18], preconditions [2716], 
and specifications [250]. The technique in finds specification over unin- 
terpreted functions by overcoming the limitation of integer abduction engines 
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through a data-driven approach. In contrast, our technique extends the abduc- 
tion itself for quantified formulas over arrays. 

Recent works in specification synthesis uses artifacts like input-output exam- 
ples, comments in the code, partial code snippets, and user-supplied constraints 
and languages to infer specifications [12]55/47]. In comparison, our work uses the 
entire program and postcondition expressed as a logical formula to find maximal 
preconditions. 


10 Limitations and Future Work 


The restriction on array access statements simplifies the conversion between ar- 
ray and integer terms in range abduction. However, this can be relaxed to sup- 
port terms like a[b/é]], a[i +1] among others, by enhancing the select-substitution 
(recall Def [10p. 

The restriction on form of postconditions, inductive invariants and precon- 
ditions is required for effective range abduction and SMT checks. Our approach 
can easily support alternating quantifiers, if the structure of the postcondition 
is close to the inductive invariant. 

For non-deterministic programs, Algorithm |4]will not terminate when a CTM 
has two satisfiable unrollings: 7 and 7 (refer Definition (8). Hence, the maxi- 
mality check will be inconclusive. Nevertheless, Algorithm [I] can still generate 
preconditions (with inductive invariants) for such programs, often maximal ones 
as observed in our experiments. We extend our approach to non-deterministic 
CHCs in [48]. 

In the case of non-terminating programs, an initial state with non-terminating 
execution can be added to either pre or cpre, as it will have inductive invariants 
for both. If added to the latter, the maximality check could wrongly conclude 
that pre is maximal when it’s not. Therefore, relaxing this restriction affects 
the soundness of the maximality check. An interesting future direction for max- 
imality checking would be to extend the work presented in [29] to incorporate 
array handling. 


Data Availability and Artifact 


The artifact accompanying the paper is publicly available at [52]. 
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Abstract. Inlining is a crucial optimisation when compiling functional 
programming languages. This paper describes how we have implemented 
and verified function inlining and loop specialisation for PureCake, a 
verified compiler for a Haskell-like (purely functional, lazy) programming 
language. A novel aspect of our formalisation is that we justify inlining by 
pushing and pulling let-bindings. All of our work has been mechanised 
in the HOL4 interactive theorem prover. 
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1 Introduction 


It can be tricky to generate high-quality code from lazy, purely functional pro- 
grams for a number of reasons. One of these reasons is that functional program- 
ming encourages a brief declarative style that makes heavy use of shorthands 
(e.g., for partially-applied functions) and higher-order functions [8]. Producing 
good code from such input requires a well-developed inliner, as noted [17] by the 
developers of the Glasgow Haskell Compiler (GHC): 


“One of the trickiest aspects of a compiler for a functional language is 
the handling of inlining. [...| Effective inlining is particularly crucial in 
getting good performance.” 


This paper is about implementing and verifying an inliner that can specialise 
loops for PureCake, an end-to-end verified compiler for a Haskell-like language [10]. 


The inliner by example. The following simple example demonstrates what 
our inliner does. Imagine that a programmer is to write a function that incre- 
ments every element of a list of integers. The programmer should write: 


suc_list = map (+1) 


Here, the programmer has relied on the library function map below to perform 
the necessary list traversal. 


© The Author(s) 2024 
S. Weirich (Ed.): ESOP 2024, LNCS 14577, pp. 275-301, 2024. 
https: //doi.org/10.1007/978-3-031-57267-8_11 
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map f [] = [O] 
map f (x:xs) = f x : map f xs 


To generate high-quality code for suc_list, the compiler must both inline and 
specialise map. Our inliner takes the definition of suc_list above and produces 
the following code. 


suc_list = 
let map' xs = 
case xs of 


[0] => 1 
(y:ys) -> y + 1 : map' ys 
in map' 


In particular, the inliner has combined the following code transformations: 


— selective expansion of function definitions at call sites; and 
— loop specialisation of recursive functions with known arguments (e.g., argu- 
ment f to map is always (+1) in suc_list). 


Contributions. Our work adds verified inlining and loop specialisation to Pure- 
Cake. Our inliner is capable of optimisations such as the one above. More specif- 
ically, we make the following contributions: 


1. We define and prove sound a relation that encapsulates an envelope of 
semantics-preserving inlinings (§ 4). This relation is independent of the 
heuristics of any real implementation. It is proved sound using a novel for- 
malisation of inlining as pushing/pulling of 1et-bindings. 

2. We derive sound equational principles that allow us to lift out arguments 
which remain constant during recursion, such as f in map in the example 
above (§ 5). These principles are phrased such that they can be used in the 
relation above and have the effect of specialising loops. 

3. We implement an inliner that can specialise loops and verify that its action 
preserves semantics, relying on the formalisations above (§ 6). 

4. We integrate our inliner into the PureCake compiler and its verification (§ 7). 


All of our work is mechanised using the HOL4 interactive theorem prover, and 
our development is open-source.? To the best of our knowledge, ours is the first 
verified inliner for a lazy functional programming language, and the first verified 
loop specialiser for any functional language. 


2 The Inliner by Example 


We begin with a high-level explanation of how our inliner works, before diving 
into verification details in later sections. We will show the transformations the 


3 https://github.com/cakeml/pure, see also our artifact hosted on Zenodo [9]. 
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inliner performs step-by-step. As a running example, we use the code from the 
previous section with one modification: we lift (+1) to a separate function add1 
for clarity. The input code after this modification is as follows: 


suc_list = map addi 


addi i =i+i 


map f [] = 0 
map f (x:xs) =f x : map f xs 


main =... 


Our inliner is installed very early in the PureCake compiler, directly after 
parsing and binding group analysis. Binding group analysis processes the pro- 
gram above to the code below, breaking up the mutually recursive bindings into 
a nesting of let-expressions. Note that there is no dependency between addi and 
map, so their definitions could be reordered; for this example we put add1 first. 


let addi i = i+ 1 in 
let map f 1 = case 1 of 
[0] == |b 
(x:xs) -> f x : map f xs in 
let suc_list = map addi in 
let main = ... in main 


The inliner receives this program as input. As it traverses the program, it 
records known definitions that it may wish to inline later on. In particular, it 
maintains a mapping from names to their definitions, which starts off empty. 
Therefore, after processing line 18 (i.e., the definition of add1), the mapping 
contains only the definition of addi, that is, \i -> i + 1. 

The inliner then moves to line 19, the let-expression that defines map. The 
definition of map is recursive, so the inliner analyses it to determine whether any 
of its arguments remain constant over all recursive calls. In the case of map, it 
finds that the first argument, f, remains constant. This means that it can loop 
specialise map to produce the following equivalent definition. 


let map f = 
let map' 1 = case 1 of 
[] =» [0 
GSD o> f x 2 mapi xs 
in map' 
alo onc 


Our inliner does not alter the definition of map in the program, but it does add 
this equivalent definition to its mapping of known definitions. We will very soon 
see why it is useful to pull out the constant argument f. 
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The inliner moves on to the definition of suc_list on line 22. 
let suc_list = map addi in... 


After pulling out the constant argument f above, the inliner considers map to 
be a single-argument function. Therefore, the application map addi here seems 
fully applied and the inliner will rewrite it. First, it transforms map add1 into 
the following. 


let suc_list = 
let f = addi in 
let map' 1 = case 1 of 
[] => l 
(x:xs) -> f x : map' xs 
in map' 
In ee 


Notice the use of a binding let f = add1 to assign the constant argument f of 
map. Then, the inliner recurses into this expression, replacing f by add1 in the 
second row of the pattern match: 


(x:xs) -> addi x : map' xs 


The inliner recurses again into the modified subexpression addi x, and realises 
that addi (which is mapped to \i -> i + 1) is fully applied. Therefore, it inlines 
addi too: 


(x:xs) -> (let i = x in i+ 1): map' xs 


Once again, the inliner recurses on the modified subexpression, turning the in- 
nermost i into x: 


(x:xs) -> (let a =x in x + 1) = map! xs 


The final code produced by the inliner is below. The definition of suc_list 
has been rewritten so extensively that it now resembles a copy of map which has 
been specialised to the add1 function. 


let addi i =i+i1in 
let map f 1 = case 1 of 
[] = 0 
(x:xs) -> f x : map f xs in 
let suc_list = 
let f = addi in 
let map' 1 = case 1 of 
H -> [] 


(x:xs) -> (let i = x in x + 1) : mapi xs 


a a 
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in map' 
in let main = ... in main 


Some dead code remains, e.g., let f = addi (line 46) and let i = x (line 49). 
We perform a simple dead code elimination pass immediately after the inliner 
to remove these. 


Single-pass optimisation. Note that our inliner does not make multiple passes 
over input code, in contrast to the presentation above. It performs a single top- 
down pass over its input, calling itself recursively only on function applications 
or variables that it has successfully rewritten. The depth of this recursion is 
bounded by a simple user-configurable recursion limit. 


3 Setting: PureCake 


We implement and verify our inlining and specialisation optimisations as part of 
the verified compiler PureCake. In this section, we describe both the PureCake 
project at a high level, and the key aspects of its formalisation on which we rely. 


What is PureCake? PureCake [10] is an end-to-end verified compiler for a 
Haskell-like language known as PurELANG. Here, a “Haskell-like” language is 
one which: is purely functional with monadic effects; evaluates lazily; and has a 
syntax resembling that of Haskell. PureCake compiles PURELANG to the CakeML 
language, which is call-by-value and ML-like, and has an end-to-end verified com- 
piler [12,14]. CakeML targets machine code, so PureCake and CakeML can be 
composed to produce end-to-end guarantees for the compilation of PuREBLANG 
to machine code [10, §6]. 

The PureCake compiler is designed to be realistic: it accepts a featureful 
input language and generates performant code. This makes it an ideal setting 
for verified inlining and specialisation optimisations. We add these to PureCake 
as PURELANG-to-PURELANG transformations. 


Formalisation details. PurRELANG is formalised using two ASTs: compiler ex- 
pressions and semantic expressions, denoted ce and e respectively [10, §3.2]. 
The compiler implementation uses compiler expressions, and their semantics is 
given by desugaring into semantic expressions (denoted desugar, of type ce > e). 

The call-by-name operational semantics of PURELANG is defined over its sim- 
pler semantic expressions [10, §3.3]. This semantics admits an equational the- 
ory [10, §3.4] which is sound and complete with respect to contextual equiv- 
alence. Its equivalence relation, e) = e2, is based on an untyped applicative 
bisimulation from Abramsky’s lazy A-calculus [1] and is proved congruent via 
Howe’s method [7], i.e., expressions composed of equivalent subexpressions are 
themselves equivalent. 

PureCake’s compiler passes are verified in two stages. 
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1. A binary syntactic relation is defined over semantic expressions (e R e2). 
The relation is proved to imply e1 © e2, so e and ez have identical observ- 
able behaviour in all contexts. Intuitively, the syntactic relation carves out 
an envelope of possible valid transformations, independent of the heuristics 
of any real implementation. 

2. The implementation is then defined over compiler expressions, with concrete 
heuristics. It is verified to perform only those valid transformations expressed 


by the syntactic relation. 


Composition of the two stages produces the overall proof that the action of the 
compiler implementation preserves semantics. A key benefit of this approach is 
that heuristics remain an implementation detail in stage 2, and can be changed 
without incurring the significant proof obligations of stage 1. 


Approach and paper outline. We can now describe more precisely the steps we 
took to add inlining and loop specialisation to the PureCake compiler. 


§ 4 (stage 1) We defined a relation which captures an envelope of valid inlining 
transformations, and proved that this relation preserves semantics. 

§ 5 We formalised loop specialisation using PURELANG’s equational theory such 
that it can be used in the envelope mentioned above. 

§ 6 (stage 2) We implemented the overall inlining and specialisation transfor- 
mations over compiler expressions, verifying that they fit the envelopes. 

§ 7 We integrated our inliner into the PureCake compiler pipeline and its top- 
level correctness result. 

§ 8 We benchmarked the performance of the output of the inliner. 


4 Inlining as a Relational Envelope 


In this section, we define a relation which characterises all the inlinings that we 
wish to perform. We then prove that any code transformation contained within 
this relational envelope must preserve semantics. 


4.1 Understanding the relation 

We begin by describing the intuition behind our relation. 

Inlining is not substitution. Inlining is a more complex transformation than 
substitution or 6-conversion. If we were to view inlining as a special case of 
these, we would generate unsatisfactory code. In particular, consider the example 


below: inlining based on substitution must replace all three occurrences of f with 
its definition; inlining based on 6-conversion would remove the let-binding. 


let f i = 5 inf 1: map f xs ++ map f ys 
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By contrast, a real inliner must be able to choose whether to inline a definition 
per use of that definition. In other words, the inliner should decide which usages 
of a given definition are rewritten on a case-by-case basis. For the example above, 
a real inliner should produce the code below. Note that it chooses to inline the 
function f only at the usage which fully applies it. 


let f i = 5 in (\i -> 5) 1: map f xs ++ map f ys 


Of course, a real inliner would further transform (\i -> 5) 1 into 5 (this is in 
fact a 6-conversion). For clarity in this example, we do not show that step. 


Inlining is a series of let transformations. The key intuition behind our 
inlining transformations is as follows. We push let-bindings into expressions as 
far as possible, rewrite the result, then pull the bindings out again. We illustrate 
this by example below, starting from the same initial code as above. 


let f i = 5 inf 1 : map f xs ++ map f ys 


We now push in the let-binding which defines f to produce a series of equivalent 
expressions. First, we push it in one step past the list constructor (:): 


(let f i = 5 in f 1) 
(let f i = 5 in map f xs ++ map f ys) 


Next, we push it in through the function application f 1: 


(let f i=5 inf) (let fi=5 im: 
(let f i = 5 in map f xs ++ map f ys) 


Now, we choose to rewrite the use of f under the first let f i = 5 to \i -> 5: 


Glas a a = e am Cul > &)) Gee @ a SB tha i) 
(let f i = 5 in map f xs ++ map f ys) 


Note that we have chosen not to perform any other rewrites of f, because other 
uses of f are not fully applied. 

We can now reverse the pushing in of let-bindings, i.e., we pull them out 
instead. The final result is as follows, where f is inlined exactly as we wanted: 


let f i = 5 in (\i -> 5) 1 : map f xs ++ map f ys 


Stacking let transformations. Above, our example shows how we can inline 
a single 1et-binding: we push it inwards, use it for rewriting, and pull it outwards 
back to its original position. We can generalise this straightforwardly to handle 
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a list of let-bindings. This mimics the implementation of a real inliner, which 
must carry with it a collection of definitions it may wish to inline. 

Consider the following example, in which an inliner attempts to rewrite the 
expression g 3 + 7 and carries definitions f i = 5;h i = 2;g i=fi+ 1. 


let f i= 5 in 


let h i = 2 in 
let gi- firtin 
S EA 


Just as with a single let-binding, we can push in the stack of let-bindings, 
rewrite, and pull them out again. This produces the following expression. 


let f i = 5 in 
let hi 2, ala 
let g i= f i+ 1 in 
A cee A E ep) gi ap al) e ae 7 


The only complication in generalising to a stack of let-bindings is that some 
definitions can depend on others. In the example above, the definition of g 
depends on f. This is why we model the bindings as a list: this preserves scoping 
correctly, ensuring we do not break any dependencies between definitions. 

Note that this intuition of pushing in and pulling out of let-bindings applies 
only to the formalisation that justifies our inlining rewrites. The implementation 
of our inliner performs no such push/pull transformations: as one might expect, 
it merely carries around a simple (unordered) map of variable names to their 
definitions. This map represents exactly the set of definitions that the inliner 
may wish to use for rewriting at usage sites. 


4.2 Defining a Semantics-Preserving Envelope 


We now describe an inductive relation, | IF e; ~> e2, which characterises all of 
the inlining transformations that we perform. We prove that any transformation 
described by the relation lies within the equational theory of PuRELANG (&, § 3). 
Therefore, the relation describes only semantics-preserving transformations. 

The relation l lk e} ~> e should be read as follows: expression e} can be 
transformed into expression e> under the definitions in the list l. Both e and 
e2 are PURELANG semantic expressions, and / is a list of definitions. Each such 
definition is of the form x + e, associating name x with semantic expression 
e. We will first describe the formal meaning of l IF e ~> e2, which is best 
understood via its soundness theorem, Theorem 1. Then in following subsections, 
we describe key parts of the definition of ~>. 

Theorem 1 relates derivations of l IF e} ~> e2 with &, PURELANG’s equational 
theory, assuming pre and lets_ ok. The definitions of pre and lets_ ok are shown in 
Figure 1—they enforce distinct variable names between both the expression e1 
and each of the definitions in / to avoid inadvertent clashes or capture. 
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vars of | = U fte} U freevars e | mem (x + e) i 
pre le 2 barendregt e A boundvars e # vars_ of | 


lets_ok [| & T 
lets_ok ((z  e)::1) & 


x ¢ freevarse A ({x}U freevars e) # {x | de. mem (x + e) l} A lets_ok 1 


Fig. 1. The definition of pre and lets_ ok. Here, the # predicate returns true only for 


disjoint sets: s1#s2 = (s1 A s2 = Ø). 


Theorem 1. Soundness of LIF e ~ eg. 


F LIF ey ~ eg A prele A lets_okl = letsle, S lets l ez 


where lets[Je = e and lets ((z + e'):: 1) e = letz =e’ in (lets/e) 


In particular, expressions e and ez related in the context of definitions l produce 
equal expressions (according to =) under the stack of let-bindings corresponding 
to l. The latter correspondence is encapsulated by the definition of lets, which 
nests let-bindings. This theorem is proved by induction over the derivation of 
LIF e1 ~> e2. In upcoming subsections, we will examine key rules of ~ and their 
cases in this inductive proof. 

When the inliner is first invoked, it is passed an entire PURELANG program 
and has no knowledge of any definitions. In other words, its mapping of variable 
names to known definitions is empty, corresponding to the list l being empty ([]). 
In this case, we can simplify Theorem 1 by instantiating | +> [], and unfolding 
the definitions of pre l and lets_ ok l. This produces the following theorem: 


Theorem 2. Soundness of || I- e1 ~> e2. 
H [] IF e1 ~ eg A barendregt e} A closed e) => e, & ez 


We can read this as follows: if we can transform some closed e} which satis- 
fies barendregt to some ez according to ~, then e and ez are equivalent. The 
barendregt predicate restricts the variable naming convention within e; to avoid 
problems with variable capture, because PURELANG has explicit names. In par- 
ticular, barendregt is the well known Barendregt variable convention that enforces 
unique free/bound variable names across an entire program [3]. 

The precise definition of barendregt is not necessary here. Suffice it to say 
that in order to discharge this assumption, our inliner implementation will rely 
on a freshening pass. This pass a-renames programs such that they obey the 
Barendregt variable convention, and therefore satisfy barendregt. 


284 H. Kanabar et al. 


Reflexivity. We must allow the inliner to choose whether to rewrite a usage 

site on a case-by-case basis (§ 4.1). Therefore, the inliner must be allowed not 

to inline, i.e., it must be able to leave an expression unchanged. Therefore the 

~ relation has a reflexivity rule: 
— REFL 
lHFewe 

The REFL case of the proof of Theorem 1 boils down to showing the equation 

lets le = lets le, which is trivial due to reflexivity of =. 


Inlining. The simplest rule for inlining uses a definition found in the list l 
(where mem denotes list membership) to rewrite a variable: 
mem (z + e)l 
— INLINE 
[lk varz ~ e 


In particular, if l associates name x with definition e, then the variable var x can 
be replaced by expression e. The INLINE case of Theorem 1 requires establishing: 


H mem (xz 4+ e)l A lets ok! A prel (var x) => lets! (var x) S letsle 


Proof outline. We first derive a lemma that allows us to duplicate a let-binding 
from l, assuming lets _ok (defined in Figure 1 such that it enables this lemma): 


H mem (z + e)l A lets_ok] = letsle’ & lets? (let x = e in e') LET-DUP 
Equipped with the Let-pup lemma, we proceed as follows: 


lets | (var x) S lets/ (let x = e in var x) (LET-DUP) 


Ile 


lets le (trivial) 


Let. We can now inline known definitions, but we must be able to learn those 
definitions in the first place. The rule Ler allows us to add a let-bound definition 
to the stack l, using the append operator (+). 


LIF ey ~ el l4 (£ 4 e1) Ik eg ~ e 


LET 
LI- (let z = e; in e2) ~> (let x = ej in eb) 


Proof outline. Let case of Theorem 1. 


lets! (let z = e, in e2) 


~ lets (l + (x + e1)) e2 (definition of lets) 
= lets (l+ (x 4+ e&)) e (IH for e2) 
~ lets! (letz = e; in e5) (definition of lets) 
~ letz = (lets l e1) in (lets l e3) (push in lets) 
~ letz = (lets l ej) in (lets! e3) (1H for e1) 
= lets! (let x = ej in eż) (pull out lets) 
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Above, we can push and pull lets through let because the precondition pre 
enforces sufficiently distinct variable names. 

Note that this rule records the unmodified expression e; in the stack of known 
definitions J. It could instead use the ~»-transformed expression ej. The proof 
strategy with this modification is essentially unchanged, except we must reverse 
our applications of the inductive hypotheses. 


Congruences. We must be able to apply ~ within subexpressions. Therefore, 
we have several congruence rules, such as the following: 


Ik ey ~ ef LIF e2 ~ e, LIF e~ e 
APP-CONG LAM-CONG 
LIF (e1 - e2) ~ (ef eb) LIH (Az. e) ~> (Az. e’) 
Vi. LIF ei ~> e; lF e~ e 


LETREC-CONG 


LI- (letrec Tn = €n in e) ~ (letrec x, = e/, in e’) 


Each such case in Theorem 1 requires showing that we can push/pull lets 
into/out of subexpressions. Once again, the precondition pre permits this by 
enforcing sufficiently distinct variable names. The remainder of the proof follows 
from congruence of =. 


Simplification. The following rule allows ~ to carry out any transformation 
that preserves ©: 
LIF ey ~ e e S e 


SIMP 
LIF ey ~ eb 
The stp case in Theorem 1 is a direct consequence of the transitivity of =. 
This rule permits the inliner to modify (and in particular, simplify) generated 
expressions during its operation. There are two important uses of this ability: 


— Turning fully applied A-abstractions into a stack of let-bindings. This allows 
recursive applications of inlining (see rule TRANs below). 


Da 


(Ax1. Avg. ... ALn. €) + C1 + CQ °... En 
lets (a4 © 6 eH Og... dn en) e (1) 


— Freshening names of bound variables (i.e., a-renaming). This happens di- 
rectly before application of the rule TRANs below. 


Transitivity. To permit recursion into recently inlined expressions, ~ has a 
transitivity rule: 


LIF ey ~ e LIF e2 ~> e3 pre l e2 
TRANS 


LIF ey ~> 63 
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In particular, e} can be transformed to ez if there is some intervening ez which 
can act as a stepping stone. 

Unusually, we require precondition pre to hold of intermediate expression e2. 
This is demanded by the proof of Theorem 1, in which we can only instanti- 
ate inductive hypotheses if we first establish pre. Unfortunately, l IF e ~> ez 
and pre l e, are not enough to derive pre l e2. Fortunately, we can freshen 
bound variable names (i.e., a-rename) sufficiently to establish pre, and justify 
this freshening using rule simp above. 


Specialisation. The ~ relation must be able to support loop specialisation, as 
described for the map function in § 2. Therefore, it has a rule spec which permits 
conversion of a letrec into a let, as long as there is a proof that the conversion 
preserves =. 


LIF ey ~ ej (Ve. letrec z = e in e S let x = e in e) 


l+ (x + e2) lk eg ~> e3 disjoint_names ez e3 x ¢ freevars ez 


SPEC 


LI- letrec x = e in e3 ~ letrec x = ej in eb 


That is, if we can -convert some letrec x = e; to some let x = eg, then we can 
append x + ez to the stack of known definitions when processing letrec body 
e3. Again, we require restrictions on variable naming: the variables bound in ez 
and e3 must be disjoint, and the bound variable x must not appear free in e2. 


Proof outline. spec case of Theorem 1. 


lets | (letrec z = e in e3) 


= lets! (let z = e in e3) (assumption of rule) 
= lets (1+ (x + e2)) e3 (definition of lets) 
= lets (l+ (x + e2)) e (IH for e3) 
= lets! (let x = e in e3) (definition of lets) 
~ lets! (letrec x = e; in e3) (ass. of rule, symmetry of ~) 
= letrec x = (lets l e1) in lets l eå (push lets) 
= letrec x = (lets l ej) in lets l e (1H for e1) 
= lets! (letrec z = ej in e3) (pull out lets) 


5 Specialisation of Recursive Bindings 


Our example in § 2 showed that our inliner can specialise applications of recur- 
sive functions such as map to known arguments such as add1. This is possible 
whenever constant arguments such as f can be pulled out of the recursion. That 
is, whenever we can transform recursive functions like map (left) into equivalent 
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code which makes the constant argument explicit using map' (right): 


let map f 1 = let map f = let map' 1 = 
case 1 of case 1 of 
0 = | [] =e | 
(x:xs) -> f x : map f xs (x:xs) -> f x : map' xs 
in map' 


In this section, we describe how we prove correctness of such transformations. 
Critically, our proofs can be used in the spec rule of ~~ from the previous section. 


5.1 Understanding Specialisation 


Like ~, our specialisation transformation is justified using equational reasoning. 
We illustrate the equational steps below, again noting that the implementation 
is much more direct. We use the map example of § 1, eliding parts not relevant 
to specialisation. The input is therefore as follows: 


dew map t WLS p55 Be Se Son iii) 52 O45 oon 
We first make a local copy of the recursive definition map, named map': 


let map = let map' f 1=... fx ... map' f xs ... 
in map' 


We then 7-expand the final usage of the copy map': 


let map = let map: f 1 =... f x ... map f xs ... 
in \f 1 -> map' f 1 


Next, we pull out the new A-abstractions to the top-level: 


let map f 1 = let map’ f l=... fx ... map' £ xs ... 
in map' f 1 


We then a-rename the constant argument in the copy (here, f becomes g): 


let map £ ii- let map ie 1 — es Map. sexsi. 
in map' f 1 


The first major step (transform 1) replaces the constant argument g with the 
known value to which the function map’ is always applied, f: 


leti map ae aL let juteyey? e aL = 65 Bk a mapi Es ae 
in map' f 1 


The second major step (transform 2) deletes the now unused argument g. It 
removes the argument from both the definition of map' and all calls to map': 
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let map f 1 = let map’ 1=... fx... map’ xs ... 
in map' 1 


We push back in some of the top-level \-abstractions, in this case just 1: 


let map f = let map' 1 =... fx.. map' xs ... 
in \l -> map' 1 


Finally, 7-contraction removes the A-abstraction over 1: 


let map f = let map' 1 =... f x ... mapi xs ... 
in map' 


Most of the steps are straightforwardly justified in PURELANG’s equational the- 
ory. However, the steps marked transform 1 and transform 2 are more involved. 
We discuss these below. 


5.2 Key Lemmas for Specialisation 


Both transform 1 and transform 2 require a substitution-like traversal of the 
entire subexpression under consideration. It is not clear how to justify these 
traversals using simple equational reasoning in PuRELANG’s theory. Therefore, 
we resort to more cumbersome simulation proofs to establish = by appealing to 
its definition in terms of PURELANG’s operational semantics. 

For transform 1, we prove a theorem of the following form. Here call_with_arg 
holds only if every application of f in e is applied to var y after n arguments, 
and the names f and y are never rebound within e. 


F call_with arg f@,yeA... 
=> letrec f = (AZ,. Ay. e) in ((var f)- Gn: (var w)- ezm) 
= letrec f = (AG. Ay. e[var’/y]) in ((var f); ern: (var w): Gm) 


Though the variable w is free in the theorem above, it is a closed constant 
expression in most parts of the proof, which simplifies the derivation of this theo- 
rem. This is because = is defined over open terms in terms of closing substitution 
and a relation over closed terms. The proof of this theorem is a large simulation 
based on the semantics of PURELANG. 

For transform 2, we prove a theorem with a similar shape. This time, remove_- 
call_arg is an inductive relation that ensures y never appears in e; and relates 
eı to a second expression ez in which the relevant argument has been removed 
from each application of f. 


F remove call arg f Tn yY Zm €1 €2 A... 
> letrec f = (AT. Ay. AZm - e1) in ((var f)- n° (vary): Cam) 
= letrec f = (A Tn . AZm . €2) in ((var f)- Gn- Cam) 


We prove this theorem by a large simulation too. The simulation strategy is 
necessary because letrec causes (potentially non-terminating) recursion. 
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6 Implementing a Correct Inliner 


In this section, we describe the implementation of our inliner and the proof that 
its action lies within the ~~ relation described in § 4. We also touch on three 
other transformations mentioned previously: specialisation, freshening of bound 
variables, and dead code elimination. Our inliner relies on all three. 


6.1 Preliminaries 


We implement our inliner within a state monad with the following type: 
aM © name set > (a, name set) 


Here, name set is a set of variable names; we will see its usage shortly. This monad 
has standard return/bind operators, and we will use Haskell-style do-notation to 
show definitions written within the monad. 

The inliner itself has the following signature: 


inline : (h : heuristic) — (k : num) > (m : (name > ce)) > ce > ce M 


In other words, the inliner transforms compiler expressions to compiler expres- 
sions within the state monad, requiring several other inputs: 


— An unordered mapping m from names to expressions. This is the “memory” 
of the inliner: the set of known definitions which it can use for rewriting. 

— Heuristic h decides whether to “remember” a definition for future inlining. It 
accepts an expression ce and returns a boolean: if true, the definition should 
be remembered. 

— Natural number k is the recursion limit for the inliner, used to bound its 
recursion into rewritten expressions. 

— The name set parameter hidden within the monad keeps track of all variable 
names (whether bound or free) in input expression ce. It is used to ensure 
that sufficiently fresh variable names are chosen when freshening the names 
of bound variables. 


6.2 Inliner implementation 


The inliner traverses compiler expressions top-down. During the traversal, it 
performs two key operations: rewriting a variable to a known definition from 
memory, and adding a new definition to memory. 


Rewriting a variable. There are two kinds of expressions in which the inliner 
will attempt to rewrite a variable. The first is a lone variable (of the form var z), 
and the second is an application of a variable to some arguments (of the form 
(var z)-...). The latter case is used to inline fully applied functions only. 
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In the lone variable case, the inliner is defined as follows: 


x ¢domainm V k=0 V 


| return (var x 
= ( ) m(x) = ày. ... 


inline% m (vara) = 


inline% m ce m(x) = ce 


That is, on encountering a free variable x the inliner does one of the following: 


— Leaves the variable unchanged if the definition of x is unknown, or the re- 


cursion limit has been reached, or the definition of x is known to be a à- 
abstraction. The last case may seem unusual, but note we do not rewrite 
variables to A-abstractions unless the result will be fully applied. This is 
handled in the application case below. 


— Rewrites the variable by inserting the expression ce found in memory, and 


In 


then recurses into ce with a decremented recursion limit. 


the application case, the inliner is defined as follows: 


inline% m ((var x) - cer... cen) = do 


[cef, ..., cel] 4 mapM (inline m) [ce1, ..., Cen]; 

if x  domainm V k=O then return ((var x) - cej -...- cel) else do 
ce + freshen (m(x) - cej -... + cel); (2) 
case convert_to_lets ce of 
| None — return ((var z) - cef <... - cel) 


| Some ce’ + inline% 1 m ce’ 


That is, on encountering a free variable x applied to n arguments the inliner 
does the following: 


Che oe 


. Recurses into the arguments to produce n new arguments. 

Searches for variable x in memory and checks the recursion limit. If x is not 
found or the recursion limit has been reached, the inliner returns variable x 
applied to the n new arguments. 

Rewrites x using its definition from memory, m(x). 

Freshens the resulting application of m(x) to the n new arguments. 
Attempts to convert the freshened application to a series of let-bindings. 
This is precisely the conversion shown in eq. (1) (pg. 11). Note that the 
conversion fails (returns None) if m(x) is not fully applied, in which case the 
inliner bails out of inlining the definition of x. 

Recurses into the newly produced series of let-bindings with a decremented 
recursion limit. 


The conversion into let-bindings is critical: it allows the inliner to learn the 
definitions of the applied arguments ce}, ..., cel, for future inlining within the 
function body of m(x). Note that we only decrement the recursion limit when the 
size of the input expression may not have strictly decreased. This happens only 
when performing non-structural recursions, which only occur when we recurse 
into a definition rewritten from memory. 
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Remembering a new definition. The inliner can remember let- or letrec- 
bound expressions. 
In the let case, it is defined as follows: 


inline% m (let z = ce, in cez) = do 
cej + inlineë m ce; 
let m’ = remember, m (x < cei); 
ce} 4+ inline m’ ces; 


return (let z = cej in cez) 


remember, m (x 4+ ce) = if cheap ce A h ce then mlx ce] else m 


That is, the inliner recurses into ce, (without decrementing the recursion limit), 
before memorising the definition x + ce, and recursing into cez with the aug- 
mented memory. The function remember records the definition only when two 
conditions are satisfied: the definition is cheap, and heuristic h returns true. 

As the name suggests, cheap is a predicate that determines whether a defini- 
tion is cheap to compute, and so will not slow the program down or cause loss 
of value sharing when inlined. The definition of cheap is as follows: 


def def 


cheap (var x) = cheap (Az. e) = cheap (op[]) = T cheap = F 


In the letrec case, the inliner must also perform specialisation. Its action is 
defined as follows: 


inline% m (letrec x = ce; in cep) = do 
ce}, < inline m cez; 
let m’ = remember_ rec, Mm (x + cer); 
cel, + inline m’ cez; 
return (letrec z = cej in ce) 


def 
remember_rec, m (x + ce) = 


if + can_ specialise (x < ce) V ~h ce then m else 
let ({wyt ... wan], A Ym. ce’) = extract_const_args (x + ce) 


in [x ++ specialise x [wf ... wen] (AGm- ce’)] 


This mirrors the let case almost exactly. The key difference is the use of re- 
member rec instead of remember: this does not check cheap, but does attempt 
specialisation (and bails out if it fails). We examine specialisation in the upcom- 
ing subsection. 


Heuristics. So far, we have only implemented one heuristic based on expres- 
sion size: the inliner only remembers definitions that are smaller than a user- 
configurable bound. Our implementation can accept any heuristic function as an 
input, making it straightforward to support new kinds of heuristic. 
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Implementing specialisation. Above, specialise transforms a letrec-binding 
into a let-binding before adding it to memory. We rely on two helper functions: 
can_ specialise and extract _const_args. 

The test can_ specialise simply checks if we are able to specialise a recursive 
body. The body must be a A-abstraction with some constant arguments. Then, 
extract_const_args will extract these constant arguments. It accepts a definition 
x < ce, where we know ce is a A\-abstraction of the form AZ, . ce. It splits the 
formal parameters Tn into £1... £m and %m41-..Ln, where m is the minimum 
number of arguments that x is invoked with recursively in body ce. It further 
annotates the 7; ...2%,, with annotations a, ...@,,,, which describe whether the 
arguments remain constant for each recursive call. In the implementation of inline 
above, this has produced the annotated variables wt and left the remainder of 
the A-abstraction untouched (A Ym . ce’). 

Then, specialise is defined as follows. 


an def 


specialise f [wf ... wan] ce = 
let (Tn, ce’) = specialise_each x [w] ... we] ce in 
let (Yi, Zj) = drop_common_ suffix [w] ... wan] Tn in 
AY. letrec f = (AZ, . ce’) in (var f) (var z)-...- (var zj) 


That is, it processes each annotated variable in turn, updating their call sites in 
body ce (i.e., performing transform 1 and transform 2 from § 5 simultaneously 
using specialise each), producing a new set of formal parameters Tn. It deter- 
mines which of these can be 7-contracted (the final step in § 5) with a call to 
drop _common_ suffix, and then returns the new letrec which accepts constant ar- 
guments 7; at the top-level, and has 7-contracted constant arguments Z; applied 
directly already. 


Freshening and Dead-Let Elimination. Our inliner assumes that its input 
expression has a variable naming convention which is sufficient to prevent it from 
accidentally capturing variables during operation. Therefore, we only give the 
inliner expressions which obey the Barendregt variable convention, which asserts 
unique bound variable names and disjoint bound/free names [3]. This is achieved 
by freshening (a-renaming) bound variables directly before inlining, and further 
freshening before recursing into subexpressions taken from the inliner’s memory. 
For example, the inliner invokes freshen in eq. (2) (pg. 16) above. This is precisely 
why the inliner carries around a name set in its state monad: this set contains 
all variable names (whether bound or free) of the input expression. Freshening 
avoids names in this set when inventing fresh names, and returns an updated 
set each time it runs. 

The output of the inliner also contains various unused let-bindings. We 
showed such bindings in the example of § 1 (namely, f and i). To remove such 
bindings, we run a dead-let elimination pass directly after the inliner. 
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Including these two auxiliary passes, the top-level definition of the inliner is 
as follows: 
ca eee def 
inliner ce = 
let (ce’, names) = freshen ce (boundvars ce) in (3) 


let (ce;, _) = inline Ø ce’ names in 
dead_let ce; 
That is, the inliner freshens names, inlines definitions top-down starting with an 
empty (Ø) memory, then removes dead lets. Note that the top-level definition 
expects to receive only closed expressions, which is why it only passes bound 
variables (boundvars) to freshen. This respects our invariant that the name set 
contains all bound and free variable names, as there are no free variables. 


6.3 Inliner correctness 


In this section, we prove that the inliner implementation is correct. In the context 
of PureCake’s proof strategy as described in § 3: 


— (stage 1) Theorem 2 above (pg. 9) proved that ~> preserves semantics. 
— (stage 2) Theorem 3 below will prove that any transformation performed by 
the inliner lies within the ~ relation of § 4. 


We then compose these results to produce our final soundness theorem: the 
output expression of the inliner is equivalent to its corresponding input. 


Theorem 3. inline satisfies ~~. 


+ inline m ce ns = (ce’, ns’) A memory_rel,,, LM A 
barendregt (desugar ce) A boundvars ce #4 domain m ^ 
freevars ce U boundvars ce C ns A wf ce 


= LIF (desugar ce) ~» (desugar ce’) 


That is, after desugaring compiler expressions into semantic expressions (desugar, 


see § 3), the action of the inliner for input ce, memory m, and name set ns lies 
within ~> for some stacked lets l when the following hold: 


— (memory_ rel) m and l contain the same definitions, and each such definition 
both satisfies wf below and has bound/free variables within ns; 

— (barendregt) bound names in ce are unique, and disjoint from free names; 

— the bound variables of ce do not shadow (are disjoint from, #) any variables 
with known definitions, i.e., those in the domain of m; 

— all bound/free variables of ce are within ns; and 

— (wf) ce is well-formed. 


Proof outline. Induction over the implementation function inline. For each case 
of the proof, we apply rules of ~~ to justify each atomic inlining operation. 
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Theorem 4. Top-level correctness of inliner. 


[as] 


H wf ce A closed ce = (desugarce) S (desugar (inliner}, ce)) 


Proof outline. Composition of Theorem 3 above with Theorem 2 (pg. 9), the 
soundness theorem for ~~. Unfolding the definition of inliner, we use the soundness 
theorem of freshen, the closed assumption, and the application of inline to empty 
memory © to discharge the preconditions on Theorem 3. 


7 Integration into the PureCake Compiler 


We insert the inliner and its associated cleanup of dead let-bindings as PURE- 
LANG-to-PURELANG transformations early in the PureCake compiler. In partic- 
ular, directly after parsing and binding group analysis, as shown in Figure 2. 
Elimination of dead lets happens directly afterwards. 

Unusually, the inliner runs before type inference. Ideally, it would take place 
afterwards: it changes program structure significantly, and type inference should 
execute on code resembling user input to allow direct error-reporting. The rea- 
soning behind this design choice is PureCake’s demand analysis, which facilitates 
strictness optimisations by annotating variables that can be evaluated eagerly. 
We found that running the inliner before demand analysis produces significantly 
better performance (§ 8, Figure 4). However, the soundness proof for demand 
analysis requires it to receive only well-typed input code. To run the inliner 
after type inference and before demand analysis, we would have to prove that 
it preserves well-typing, which is a significant undertaking due to PuURELANG’s 
untyped AST. Future iterations of PuRELANG’s AST are intended to be typed; 
therefore, we could consider proving type preservation in future work. 

To update PureCake’s compiler correctness theorem after integrating our in- 
liner, we must establish that the inliner preserves both semantics and various 
syntactic invariants. We have already presented our proof of semantics preserva- 
tion in § 6. The latter syntactic invariants guarantee that compiler expressions 
are closed and satisfy well-formedness properties which are checked as part of 
parsing. For example, PuRELANG forbids degenerate function applications to zero 
arguments: this can be expressed in the AST for PuRELANG compiler expressions 
but is ill-formed. Establishing preservation of the invariants is mostly mechani- 
cal, but quite tedious and long-winded. 


8 Benchmarks 


In this section we measure the efficacy of our inliner. In particular, we benchmark 
code generated by PureCake to determine how much the addition of the inliner 
improves runtime and memory overhead. 
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Language 


Concrete syntax 


PURELANG 
ce 
pure call-by-name 
(subst. semantics) 


front end 


VAVELUAULVLG, 


back end 


THUNKLANG 
pure call-by-value 
(subst. semantics) 


ENVLANG 
pure call-by-value 
(env. semantics) 


STATELANG 
impure call-by-value 
(env. semantics) 


CakeML source 
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Compiler implementation 


YY YUN VYY YV N/Z 


lex, parse, desugar 
binding group analysis; simplify 
inline, specialise loops <— new 


remove dead lets <— new 
type inference 
simplify 


demand analysis 


translate to call-by-value; 
introduce delay /force; 
avoid delay (force (var _)) 
lift A-abstractions 

out of delays 


simplify forces 


reformulate to simplify 
compilation to STATELANG 


compile delay /force and 
IO monad to stateful ops 


push _- unit inwards 


make every A-abstraction 
bind a variable 

translate to CakeML; 
attach preamble 


Fig. 2. High-level structure of the PureCake compiler. The inliner and its associated 
clean up are PURELANG-to-PURELANG passes which take place immediately after bind- 


ing group analysis and before type inference. 
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Fig. 3. Graphs showing the performance impact of our inliner: the base-2 logarithm of 
a ratio of measurements (execution time or heap allocations) with/without the inliner 
enabled: log, (™disabled/menapiea)» Error bars are too small to be visible. 
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Fig. 4. Graphs showing the performance impact of our inliner when executed after 
PureCake’s demand analysis. Performance is clearly worse compared to Figure 3; there- 
fore we do not pursue this approach. 


Methodology. We evaluate the performance of several benchmark programs with 
and without the inliner enabled, using an Intel) Xeon®) E-2186G and 64 GB 
RAM. We consider the same programs as presented by the PureCake developers 
in prior work [10, §7.1]. We also add a new suc_list program, which repeatedly 
applies the suc_list function shown in § 1 to a list of natural numbers. Like 
the PureCake developers, we measure wall-clock runtime and total heap alloca- 
tions as reported by the CakeML runtime. Our measurements are facilitated by 
existing benchmarking scripts found in the PureCake development. 


Results. Figure 3 shows our results, plotted as two bar graphs: the left shows 
runtime speedup, the right shows allocation reduction. In many cases, our inliner 
significantly improves performance; in all cases it does not worsen performance. 
The value for each plot is obtained by taking the base-2 logarithm of a ratio: the 
measurement without the inliner enabled (i.e., the longer duration or greater 
allocation) divided by the measurement with the inliner enabled. Expressed as 
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Table 1. Line counts for each part of our development. 


Part of development kLoC 


Syntactic relation (~>) and its soundness (§ 4) 2.6 
Equational theory behind specialisation (§ 5) 4.0 
Implementation of inliner (incl. specialisation) (§ 6.2) 0.6 
Correctness of the implementation (§ 6.3) 3.7 
Freshening and its correctness proof 3.1 
Elimination of dead lets and its correctness proof 0.5 


Total ~15 


a percentage, the most significant improvements are: a ~20% reduction in the 
runtime of life, a ~15% reduction in the allocations of suc_ list. 


Inliner placement. We noted in § 7 that our inliner should run before PureCake’s 
demand analysis. Here, we justify that design choice. In particular, we benchmark 
a version of the PureCake compiler which runs our inliner directly after demand 
analysis. The results are shown in Figure 4. The improvements in runtime and 
memory overhead are reduced for several benchmarks, and in some cases runtime 
even worsens overall. Therefore, our inliner should run before demand analysis 
for maximum benefit. 


Code size and compile times. Simple measurements of code size show that our 
inliner can produce significantly larger CakeML programs (~50% increase); how- 
ever CakeML’s efficient handling of inserted lets reduces the effect for binaries 
(< 15% overall increase). Compile times are unaffected: these remain dominated 
by PureCake’s type-checking and CakeML’s register allocation. 


Line counts. Our work adds to PureCake significantly. Table 1 shows line counts 
for each part of our development, measured using we -1. 


9 Related Work 


Verified inlining in functional languages. CakeML [12] compiles a subset of Stan- 
dard ML (strict, impure) to several mainstream architectures with end-to-end 
guarantees. It performs function inlining in its second intermediate language, 
CLosLanc, which has first-class closures. A flow analysis discovers invocations 
of known functions, and simultaneously inlines closed functions which themselves 
do not contain closures. Use of de Bruijn indices sidesteps reasoning about shad- 
owing and freshening. As in our work, recursive applications of inlining improve 
the performance of higher-order functions; we go one step further with speciali- 
sation and the inlining of open terms which can contain A-abstractions. 
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CertiCogq [2] verifiably compiles Gallina (the metalanguage of Coq) to C light, 
an intermediate language early in CompCert’s pipeline. One of its passes [4] 
performs several shrink reductions simultaneously: transformations that only 
reduce code size. One such reduction is the inlining of functions which are applied 
exactly once; in this case, inlining is G-reduction, contrary to our discussion in 
§ 4.1. Restriction to shrink reductions further removes the need for a recursion 
limit as code size strictly decreases on each recursive call. Their verification 
relies on a more general rewrite system which permits inlining of functions which 
are used multiple times. A separate pass [16] further inlines small non-recursive 
functions which can be applied multiple times; here a key concern is maintenance 
of A-normal form expressions. In all proofs, the Barendregt variable convention 
(i.e., barendregt) is used to avoid name clashes. 

Pilsner [15] compiles a strict impure language to an idealised assembly, inlin- 
ing select top-level functions in its intermediate representation. Recursive func- 
tions can be unrolled in this way, but not specialised. Again, the Barendregt 
variable convention is enforced. The focus here is on the novel proof technique of 
parametric inter-language simulations (PILS) to enable compositional compiler 
correctness, where PureCake focuses on mechanised whole-program compiler cor- 
rectness for a realistic language. 


Other verified inlining passes. CompCert [13] compiles a subset of C99, perform- 
ing function inlining in its register transfer language (RTL). This control flow 
graph (CFG) representation differs considerably from the functional PuRELANG; 
inlining considers only top-level function declarations in the RTL setting. Rather 
than using a recursion limit, CompCert guarantees termination by forbidding in- 
lining of functions within their own bodies. 

CompCert also performs lazy code motion [19] within RTL. A special case 
of this transformation is loop-invariant code motion, which loosely resembles 
our specialisation: both are concerned with moving constant expressions out of 
loops, but in our functional setting loops are expressed as recursive functions. 
Their verification uses translation validation [18]: an unverified tool transforms 
code, and then per-run automation proves that semantics has been preserved. 

The Plutus Tx language from the Cardano blockchain platform resembles a 
subset of Haskell, and is compiled to a custom language known as Plutus Core. 
The compiler is implemented as a GHC plugin: GHC machinery first lowers 
Plutus Tx to a System F-like language, which is then optimised and compiled 
further. The compiler is verified using translation certification [11], which aims 
to make translation validation approaches less brittle by combining automated 
and manual proof. As in PureCake, syntactic relations are used to encapsu- 
late semantics-preserving transformations: automated proof shows that unver- 
ified code transformations inhabit the relations, and manual proof shows that 
the relations preserve semantics. Translation certification is robust to evolving 
compiler implementations because the syntactic proofs are more amenable to 
automated verification than the semantic ones. A syntactic relation akin to § 4 
justifies inlining; however, semantic verification is ongoing work at the time of 
writing. The Barendregt variable convention is enforced in this work too. 
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Verified optimisation of realistic Haskell-like languages. The CoreSpec project* 
tackles verified variants of Haskell as implemented by GHC. For example, GHC’s 
dependent types extensions were proposed using formal specifications of the syn- 
tax, semantics, and typing rules of GHC’s Core language [20]. The unverified tool 
hs-to-coq [6] translates Haskell code to Gallina (Coq’s metalanguage), lever- 
aging Coq’s logic to enable equational reasoning about real-world programs. A 
future aim of the project is to derive Coq models of Core automatically from 
GHC’s implementation, prove correctness of optimisations within Coq, and in- 
tegrate the resulting verified code back into GHC as a plugin. Where CoreSpec 
focuses on accurate modelling of GHC with the loss of some trust, PureCake 
instead sacrifices faithfulness for end-to-end guarantees. 

GHC’s arity analysis pass [5] 7-expands functions to avoid excessive thunk 
allocations. Its mechanised proof of correctness for a simplified Core language 
relies on an explicitly call-by-need semantics to show performance preservation, 
i.e., that 7-expansion does not reduce value-sharing. 


10 Summary and Future Work 


This paper has described our work on a verified inlining and loop specialisation 
pass for PuRELANG, a lazy functional programming language. First, we verified 
a syntactic relation which defines an envelope of permitted inlining transforma- 
tions, independent of heuristic choices. We used a novel phrasing of inlining as 
the pushing in and pulling out of let-bindings to prove the relation sound us- 
ing PurELANG’s equational theory. Our inliner implementation is then proven 
to remain within this envelope. We have integrated our work into the Pure- 
Cake compiler, an end-to-end verified compiler, and demonstrated significant 
performance improvements. To the best of our knowledge, ours is the first ver- 
ified function inliner for a lazy functional programming language, and the first 
verified loop specialiser for any functional language. 

In future work, we intend to support loop unrolling and develop better heuris- 
tics that decide when to do inlining. Loop unrolling will probably involve aug- 
menting the definition of lets so that it can hold both let expressions and 
letrecs. Developing good heuristics will require some careful experimentation 
with the compiler implementation. We do not expect adjustment to the inliner’s 
heuristics to impact our correctness proofs in any significant way, since the proofs 
are designed to be independent of heuristic choices. 
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Abstract. Universal probabilistic programming languages (PPLs) make 
it relatively easy to encode and automatically solve statistical inference 
problems. To solve inference problems, PPL implementations often ap- 
ply Monte Carlo inference algorithms that rely on execution suspension. 
State-of-the-art solutions enable execution suspension either through (i) 
continuation-passing style (CPS) transformations or (ii) efficient, but 
comparatively complex, low-level solutions that are often not available 
in high-level languages. CPS transformations introduce overhead due to 
unnecessary closure allocations—a problem the PPL community has gen- 
erally overlooked. To reduce overhead, we develop a new efficient selective 
CPS approach for PPLs. Specifically, we design a novel static suspen- 
sion analysis technique that determines parts of programs that require 
suspension, given a particular inference algorithm. The analysis allows 
selectively CPS transforming the program only where necessary. We for- 
mally prove the correctness of the analysis and implement the analysis 
and transformation in the Miking CorePPL compiler. We evaluate the 
implementation for a large number of Monte Carlo inference algorithms 
on real-world models from phylogenetics, epidemiology, and topic model- 
ing. The evaluation results demonstrate significant improvements across 
all models and inference algorithms. 


Keywords: Probabilistic programming - Static analysis - Continuation- 
passing style. 


1 Introduction 


Probabilistic programming languages (PPLs), such as Anglican [50], Birch [36], 
WebPPL [18], Stan [10], Pyro [6], and Gen [11], make it possible to encode and 
solve statistical inference problems. Such inference problems are of significant in- 
terest in many research fields, including phylogenetics [43], computer vision [25], 
© The Author(s) 2024 


S. Weirich (Ed.): ESOP 2024, LNCS 14577, pp. 302-330, 2024. 
https: //doi.org/10.1007/978-3-031-57267-8_12 


Suspension Analysis and Selective CPS for Universal PPLs 303 


topic modeling [7], inverse graphics [20], and cognitive science [19]. A particu- 
larly appealing feature of PPLs is the separation between the inference problem 
specification (the language) and the inference algorithm used to solve the prob- 
lem (the language implementation). This separation allows PPL users to focus 
solely on encoding their inference problems while inference algorithm experts 
deal with the intricacies of inference implementation. 

Implementations of PPLs apply many different inference algorithms. Monte 
Carlo inference algorithms—such as Markov chain Monte Carlo (MCMC) [16] 
and sequential Monte Carlo (SMC) [12]—are popular due to their asymptotic 
correctness and relative ease of implementation for universal® PPLs. The cen- 
tral idea behind all Monte Carlo methods in PPLs is to execute probabilistic 
programs multiple times to generate samples that approximate the target dis- 
tribution for the encoded inference problem. However, repeated execution is 
expensive, and PPL implementations must avoid unnecessary overhead. 

Monte Carlo algorithms often need to suspend executions. For example, 
MCMC algorithms can suspend at random draws in the program to avoid un- 
necessary re-execution when proposing new executions, and SMC algorithms 
can suspend at likelihood updates to resample executions. Languages such as 
WebPPL [18] and Anglican [50], and the approach described by Ritchie et 
al. [41], apply continuation-passing style (CPS) transformations [3] to enable 
arbitrary suspension during execution. The main benefit of CPS transforma- 
tions is that they are relatively easy to implement in functional programming 
languages. However, one disadvantage with CPS transformations is that high- 
performance low-level languages, without higher-order functions, do not support 
them. For this reason, there are also more direct low-level alternatives to CPS, 
including non-preemptive multitasking (e.g., coroutines [15]) and PPL control- 
flow graphs [30]. These more direct alternatives can additionally avoid much of 
the overhead resulting from CPS®, but are more complex to implement. 

We consider how to bridge the performance gap between CPS-based PPLs 
and lower-level PPLs that rely on, e.g., direct implementation of coroutines. We 
consider optimizations at the CPS transformation level, and not the transla- 
tion from CPS-based PPLs to lower-level representations. CPS overhead is a 
result of closure allocations for continuations. We make the important observa- 
tion that PPLs do not require the arbitrary suspensions provided by full CPS 
transformations. Most Monte Carlo inference algorithms require suspension only 
in very specific parts of programs. Current state-of-the-art CPS-based PPLs do 
not consider inference-specific suspension requirements to reduce CPS overhead. 

We design a new static suspension analysis and a new selective CPS trans- 
formation for PPLs that together significantly reduce runtime overhead com- 


5 A term that first appeared in Goodman et al. [17], indicating expressive PPLs where 
the number and types of random variables are not always known statically. 

6 Note that CPS only results in overhead if programs reify the continuations at run- 
time to, e.g., suspend computations. Traditional CPS-based compilers often only use 
CPS as an intermediate form during compilation, which does not result in runtime 
overhead. 
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pared to a traditional full CPS transformation. Current state-of-the-art func- 
tional PPLs that use CPS for execution suspension can therefore greatly benefit 
from our new approach. The suspension analysis identifies all parts of programs 
that may require suspension as a result of applying a particular inference algo- 
rithm. We formalize the suspension analysis algorithm using a core PPL calculus 
equipped with a big-step operational semantics. Specifically, the challenge lies in 
capturing how suspension requirements propagate through the program in the 
presence of higher-order functions. Furthermore, we formalize the selective CPS 
transformation and justify its correctness when guided by the suspension anal- 
ysis. Prior work on selective CPS for general-purpose programming languages, 
e.g., by Nielsen [38] and Asai and Uehara [4], focuses on analyses based on type 
systems and type inference. In contrast, we instead build our suspension analysis 
using 0-CFA [46] and it operates directly on an untyped calculus. 


Overall, we (i) prove that the suspension analysis is correct, (ii) show that 
the resulting selective CPS transformation gives significant performance gains 
compared to using a full CPS transformation, and (iii) show that the overall ap- 
proach is directly applicable to a large set of inference algorithms. Specifically, we 
evaluate the approach for the following inference algorithms: likelihood weight- 
ing, the SMC bootstrap particle filter, the SMC alive particle filter [24], aligned 
lightweight MCMC [29,49], and particle-independent Metropolis—Hastings [40]. 
We consider each inference algorithm for four real-world models from phyloge- 
netics, epidemiology, and topic modeling. 

We implement the suspension analysis and selective CPS transformation in 
Miking CorePPL [30,9]. Similarly to WebPPL and Anglican, the implementa- 
tion supports the co-existence of many inference problems and applications of 
inference algorithms to these problems within the same program. However, com- 
pared to full CPS, such programs are more challenging to handle with selective 
CPS, as the CPS transformation of an inference problem also depends on the ap- 
plied inference algorithm—different inference algorithms generally require differ- 
ent suspensions. To complicate things further, different inference problems may 
share some code, or the PPL user may apply two different inference algorithms 
to the same inference problem. The compiler must then apply different CPS 
transformations to different parts of the program, and sometimes even many 
different CPS transformations to separate copies of the same part of the pro- 
gram. To solve this, we develop an approach that, for any given Miking CorePPL 
program, extracts all possible inference problems and corresponding inference al- 
gorithm applications. This extraction procedure allows the correct application 
of selective CPS throughout the program. 


In summary, we make the following contributions. 


— We design, formalize, and prove the correctness of a suspension analysis 
for PPLs, where the suspension requirements come from a given inference 
algorithm (Section 4). 

— We design and formalize a new selective CPS transformation for PPLs. Com- 
pared to full CPS, selectively CPS transforming PPL programs guided by 
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the suspension analysis significantly reduces runtime overhead resulting from 
unnecessary closure allocations (Section 5). 

— We implement the suspension analysis and selective CPS transformation in 
the Miking CorePPL compiler. Unlike full CPS, selective CPS introduces 
challenges for probabilistic programs containing many inference problems 
and inference algorithm applications. We implement an approach that cor- 
rectly applies selective CPS to such programs by extracting individual infer- 
ence problems (Section 6). 


Section 7 presents the evaluation and its results for the implementations in Mik- 
ing CorePPL, Section 8 discusses related work in more detail, and Section 9 
concludes. We first consider a motivating example in Section 2 and introduce 
the underlying PPL calculus in Section 3. 

An extended version of the paper is available at arXiv [31]. We use the 1 
symbol in the text to indicate that more information (e.g., proofs) is available 
in the extended version. 


2 A Motivating Example 


This section introduces the running example in Fig. 1 and uses it to present the 
basic idea behind PPLs and how inference algorithms such as SMC and MCMC 
make use of CPS to suspend executions. Most importantly, we illustrate the 
motivation and key ideas behind selective CPS for PPLs. 

Consider the probabilistic program in Fig. la, written in a functional-style 
PPL. The program encodes an inference problem for estimating the probability 
distribution over the bias of a coin, conditioned on the outcome of four exper- 
imental coin flips: true, true, false, and true (true = heads and false = tails). 
At line 1, we use the PPL-specific assume construct to define our prior belief 
in the bias a, of the coin. We set this prior belief to a Beta(2,2) probability 
distribution, illustrated in Fig. 1b. In the illustration, 0 indicates a coin that 
always results in false, 1 a coin that always results in true, and 0.5 a fair coin. 
We see that our prior belief is quite evenly spread out, but with more probability 
mass towards a fair coin. To condition this prior distribution on the observed 
coin flips, we conceptually execute the program in Fig la infinitely many times, 
sampling values from the prior Beta distribution at assume (line 1) and, as a 
side effect, accumulating the product of weights given as argument to the PPL- 
specific weight construct (line 4). We make the four consecutive calls weight 
(fBernoulli ay true), weight ( fBernoulli ay true), weight (fBernoulli ay false) , 
and weight (fBernoulli @1 true)”, using the recursive function iter. The func- 
tion application fBernoulli @1 0 gives the probability of the outcome o given a 
bias a, for the coin. I.e., fBernouli @1 true = a; and fpBernoulli a1 false = 1 — ay. 
So, for example, a sample a, = 0.4 gets the accumulated weight 0.4-0.4-0.6-0.4 


T PPLs also commonly use a similar built-in function observe to update the weight. 
For example, observe (Bernoulli ai) true is equivalent to weight (fBernoulli @1 
true). 
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let a, = assume (Beta 2 2) in 
let rec iter = obs. 
if null obs then () else 0 0.5 1 
weight (fBernoulli @1 (head obs)); 
iter (tail obs) (b) Beta(2,2). 
in 
iter [true, true, false, true]; 
ay 


ONO RWNH 


(a) Program texample- 0 0.5 1 
(c) Distribution of texample- 


1 Suspension, Beta 2 2, Xa,. 


assune ( 


2 let rec iter = obs. 
3 if null obs then () else 1 let k7 = Ate. 
4 weight (fBernoulli(a1) ee lane 
5 kend oba): 3 Suspension ssmelt7, Àa1. 
E ier E oba) j 4 let rec iter = Akı. Xobs. 
a in 5 let kg = Atı. 
s iter [true,true, false, true]; G if tı then kı () else 
or ) 7 let k3 = Ate. 
i 8 let k4 = Ats. 
9 let ks = At4. 
(d) Suspension at assume. 10 Suspension,gignt (ta, À- 
11 let kg = Ats.iter kı t5 in 
1 let a, = assume (Beta 2 2) in 12 tailops ke obs) 
2 let rec iter = Ak. Xobs. 13 in to ks t3 
3 if null obs then k () 14 in headcps k4 obs 
4 else 15 in fBernoullicps K3 @1 
5 Suspension eight ( 16 in nullcps k2 obs 
6 FBernoulli(a,) (head obs), 17 in iter CA. a1) 
7 (A. iter k (tail obs))) 18 [true, true, false, true]) 
ain a 19 in te kg 2 
gater On aid 20 in Betacps ky 2 
10 true, true, false, true]; 
| (f) Full CPS. 


(e) Suspension at weight. 


Fig. 1: A probabilistic program texample modeling the bias of a coin. Fig. (a) 
gives the program. The function fBernoulli is the probability mass function of 
the Bernoulli distribution. Fig. (b) illustrates the distribution for a; at line 1 
in (a). Fig. (c) shows the set of (weighted) samples resulting from conceptually 
running texample infinitely many times. Fig. (d) and Fig. (e) show the selective 
CPS transformations required for suspension at assume and weight, respectively. 
Fig. (f) gives texample in full CPS, with suspensions at assume and weight. The 
cps subscript indicates CPS-versions of intrinsic functions such as head and tail. 
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and a; = 0.7 the accumulated weight 0.7-0.7-0.3-0.7. The end result is an infinite 
set of weighted samples of a, (the program returns a; at line 8) that approxi- 
mate the posterior or target distribution of Fig. la, illustrated in Fig 1c. Note 
that, because we observed three true outcomes and only one false, the weights 
shift the probability mass towards 1 and narrows it slightly as we are now more 
sure about the bias of the coin. Increasing the number of experimental coin flips 
would make Fig. lc more and more narrow. 


We can approximate the infinite number of samples by running the program 
a large (but finite) number of times. This basic inference algorithm is known 
as likelihood weighting. The problem with likelihood weighting is that it is only 
accurate enough for simple models. For complex models, it is common that only 
a few likelihood weighting samples (often only one) get much larger weights rela- 
tive to the other samples, greatly reducing inference accuracy. Real-world models 
require more powerful inference algorithms based on, e.g., SMC or MCMC. A 
key requirement in both SMC and MCMC is the ability to suspend executions 
of probabilistic programs at calls to weight and/or assume. One way to enable 
suspensions is by writing programs in CPS. We first illustrate a simple use of 
CPS to suspend at assume in Fig. 1d. Here, the program immediately returns 
an object Suspension, ,sune(Beta 2 2, k), indicating that execution stopped at an 
assume with the argument Beta 2 2 and a continuation k (i.e., the abstraction 
binding a,) that executes the remainder of the program. With likelihood weight- 
ing, we would simply sample a value a, from the Beta 2 2 distribution and resume 
execution by calling k a,. This call then runs the program until termination and 
results in the actual return value of the program, which is ay. Many MCMC in- 
ference algorithms reuse samples from previous executions at Suspension, sume: 
and the suspensions are thus useful to avoid unnecessary re-execution [41]. 


As a second example, we illustrate suspension at weight for, e.g., SMC in- 
ference in Fig. le. Here, we require suspensions in the middle of the recursive 
call to iter, and writing the program in CPS is more challenging. We rewrite the 
iter function to take a continuation k as argument, and call the continuation 
with the return value () at line 3 instead of directly returning () as in Fig. la at 
line 3. This continuation argument k is precisely what allows us to construct and 
return Suspension,.ignz Objects at line 5. To illustrate the suspensions, consider 
executing the program with likelihood weighting. First, the program returns the 
object Suspension, .ignt (fBeroulli(a;) true, k’), where k” is the continuation that 
line 7 constructs. Likelihood weighting now updates the weight for the execu- 
tion with the value fgernoulli(a,) true and resumes execution by calling k’ (). 
Similarly, this next execution returns Suspension, gnt (fBernoulli(a,) true, k”) for 
the second recursive call to iter, and we again update the weight and resume 
by calling k” (). We similarly encounter Suspension peignt( fBernoulli(aı) false, k”) 
and Suspension,eignt(fBernoulli(a,) true, k””) before the final call k”” () runs 
the program to termination and produces the actual return value a1. In SMC, 
we run many executions concurrently and wait until they all have returned a 
Suspension,eigne Object. At this point, we resample the executions according 


to their weights (the first value in Suspension,.;,n.), Which discards executions 
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with low weight and replicates executions with high weight. After resampling, 
we continue to the next suspension and resampling by calling the continuations. 

PPL implementations enable suspensions at assume and/or weight through 
automatic and full CPS transformations. Fig. 1f illustrates such a transforma- 
tion for Fig. la. We indicate CPS versions of intrinsic functions with the cps 
subscript. Note that the full CPS transformation results in many additional 
closure allocations compared to Fig. 1d and Fig. le. As a result, runtime over- 
head increases significantly. The contribution in this paper is a static analysis 
that allows an automatic and selective CPS transformation of programs, as in 
Fig. 1d and Fig. le. With a selective transformation, we avoid many unneces- 
sary closure allocations, and can significantly reduce runtime overhead while still 
allowing suspensions as required for a given inference algorithm. 


3 Syntax and Semantics 


This section introduces the PPL calculus used to formalize the suspension anal- 
ysis in Section 4 and selective CPS transformation in Section 5. Section 3.1 gives 
the abstract syntax and Section 3.2 a big-step operational semantics. Section 3.3 
introduces A-normal form—a prerequisite for both the suspension analysis and 
the selective CPS transformation. 


3.1 Syntax 


We build upon the standard untyped lambda calculus, representative of func- 
tional universal PPLs such as Anglican, WebPPL, and Miking CorePPL. We 
define the abstract syntax below. 


Definition 1 (Terms, values, and environments). We define terms t € T 
and values v E€ V as 
tzs=a|c|aArt|tt |let c=t int v= c | Ax. t, p) 
| if t then t else t | assume t | weight t (1) 
z,yEX pEP ceC {false,true,()}URUDCC. 


The countable set X contains variable names, C intrinsic values and operations, 
and D C C intrinsic probability distributions. The set P contains evaluation 
environments, i.e., maps from variables in X to values in V. 


Definition 2 (Target language terms). As a target language for the selective 
CPS transformation in Section 5, we additionally extend Definition 1 to target 
language terms t E€ T* by 


t += Suspension,ssune(t,t) | Suspension,,.; gn (t, t). (2) 


Fig. la gives an example of a term in T, and Fig. 1d and Fig. le of terms in 
T+. However, note that the programs in Fig. 1 also use the list constructor [...] 
(not part of the above definitions) to make the example more interesting. 
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In addition to the standard variable, abstraction, and application terms in the 
untyped lambda calculus, we include explicit let expressions for convenience. 
Furthermore, we use the syntactic sugar let rec f = Azx.t; in tz to define 
recursive functions (translating to an application of a call-by-value fixed-point 
combinator). We use t1; t2 as a shorthand for (A_.t2) ti, where _ means that 
we do not use the argument. That is, we evaluate tı for side effects only. 

We include a set C of intrinsic operations and constants essential to inference 
problems encoded in PPLs. The set of intrinsics includes boolean truth values, 
the unit value, real numbers, and probability distributions. We can also add 
further operations and constants to C. For example, we can let + € C to support 
addition of real numbers. To allow control flow to depend on intrinsic values, we 
include if expressions that use intrinsic booleans as condition. 

We saw examples of the assume and weight constructs in Section 2. The 
assume construct takes distributions D C C as argument, and produces random 
variables distributed according to these distributions. For example, we can let 
N €C be a function that constructs normal distributions. Then, assume (M 
0 1), where NV 0 1 € D, defines a random variable with a standard normal 
distribution. Partially constructed distributions, e.g., M 0, are also in C, but 
not in D (they are not yet proper distributions). As we saw in Section 2, the 
weight construct updates the likelihood with the real number given as argument, 
and allows conditioning on data (e.g., the four coin flips in Fig. 1). 


3.2 Semantics 


We construct a call-by-value big-step operational semantics, based on Lundén 
et al. [29], describing how to evaluate terms t € T. Such a semantics is a key 
component when formally defining the probability distributions corresponding 
to terms t € T (e.g., the distribution in Fig. 1c corresponding to the program in 
Fig. la) and also when proving various properties of PPLs and their inference 
algorithms (e.g., inference correctness). See, e.g., the work by Borgström et al. [8] 
and Lundén et al. [28] for full formal treatments. 

We use the semantics to formally define suspension, and use this definition 
to state the soundness of the suspension analysis in Section 4 (Theorem 1). We 
use a big-step semantics, as we do not require the additional control provided by 
a small-step semantics. For example, we do not concern ourselves with details 
of termination, as the soundness of the analysis relates only to terminating ex- 
ecutions. Fig. 2 presents the full semantics as a relation p F t ° Y v over tuples 
(P,T,S, {false, true}, R, V). S is a set of traces capturing the random draws at 
assume during evaluation. Intuitively, p F t *\/” v holds iff t evaluates to v in 
the environment p with the trace s and the total probability density (i.e., the 
accumulated weight) w. We describe the suspension flag u later in this section. 

Most of the rules are standard and we focus on explaining key properties 
related to PPLs and suspension. We first consider the rule (CONST-APP), which 
uses the 6-function to evaluate intrinsic operations. 
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pb ty et Art, p) pk te i? ve pare vtt? uv 


81 ||salls3 j wi w2:w3 
pF ti t2 Paivasvua V 


(APP) 


p F ty ier G P F te -is C2 


ee aaa (Const-AppP) 
PET Vraise P(t) pk ty t2 °! urvus 5(e1, €2) 
F ty yt v £e vi te Ju V 
= (ayy CEN SAO N ery 
pk Awt “Yc. (Ax-t, p) pt let x=t; in te Salaa [Bate v 
F ty “1? true F to “202 v 
i 7 (Const) pEi vs pF ta “Vas == (Ir-TRUE) 
pre“ lirst pt if tı then tz else t3 “12 w12 yo 
Kt’ d —_ bt ye w 
k Yu 2 - fale) (ASSUME) p aa = (WEIGHT) 
pt assume t OU sa oe c pF weight t * WP suspendyeigne Vu (0) 


Fig. 2: A big-step operational semantics for t € T. We omit the rule (IF-FALSE) 
for brevity; it is analogous to (IF-TRUE). The environment p, x +> v denotes p 
extended with a binding v for x. For each d € D, the function fa is its probability 
density or probability mass function. E.g., fiv(o,1)(©) = er /2 ISIT, the density 
function of the standard normal distribution. We use the following notation: || 
for sequence concatenation, - for multiplication, and V for logical disjunction. 


Definition 3 (Intrinsic arities and the 6-function). For each c € C, we 
let |c| E€ N denote its arity. We also assume the existence of a partial function 


ô: C x CC such that if 6(c,c1) = c2, then |c| > 0 and |c2| = |e] — 1. 


For example, 6((6(+,1)),2) = 3. We use the arity property of intrinsics to for- 
mally define traces. 


Definition 4 (Traces). For alls € S, s is a sequence of intrinsics with arity 0, 
called a trace. We write s = [c1,C2,...,Cn] to denote a trace s with n elements. 


The rule (ASSUME) formalizes random draws and consumes elements of the trace. 
Specifically, (ASSUME) updates the evaluation’s total probability density w € R 
with the density w’ of the first trace element with respect to the distribution 
given as argument to assume. The rule (WEIGHT) furthermore directly modifies 
the total probability density according to the weight argument. 

We now consider the special suspension flag u in the derivation p F t *\)”? v. 


Definition 5 (Suspension requirement). A derivation pF t *\L? v requires 
suspension if the suspension flag u is true. 


For example, the rule (APP) requires suspension if u1 V ug V uz—i.e., if any sub- 
derivation requires suspension. To reflect the particular suspension requirements 
in SMC and MCMC inference, we limit the source of suspension requirements 
to assume and weight. We turn the individual sources on and off through the 
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1 let tı = 2 in 19 let ty4 = t11 t13 in 

2 let t2 = 2 in 20 let wi = weight tı4 in 
3 let t3 = Beta in 21 let tis = tail in 

4 let t4 = t3 tı in 22 let tig = ti5 obs in 

5 let ts = t4 to in 23 let ti7 = iter tig in 

6 let a, = assume ts in 24 tiz 

7 let rec iter = Xobs. 25 in 

8 let ts = null in 26 tg 

9 let t7 = tg obs in 27 in 

10 let tg = 28 let tig = true in 

11 if t7 then 29 let tig = false in 

12 let to = () in 30 let too = true in 

13 to 31 let t21 = true in 

14 else 32 let t22 = [ta1,t20,t19,t18] in 
15 let tio = fBernoulli in 33 let t23 = iter t22 in 

16 let tıı = tio ai in 34 a1 

17 let tig = head in 

18 let ti3 = ti2 obs in 


Fig. 3: The running example texampie from Fig. la transformed to ANF. 


boolean variables suspend assume ANd suspend,eign, IN Fig. 2. For the examples in 
the remainder of this paper, we let suspend,.ign_ = true and suspend = false 
(i.e., only weight requires suspension, as in SMC inference). 

To illustrate the semantics, consider texample of Fig. la again. Because texample 
evaluates precisely one assume, the only valid traces for texample are singleton 
traces [a1], where a; € Rjo1; due to the Beta prior for a,. By initially setting 


p to the empty environment Ø and following the rules of Fig. 2, we derive Ø H 


a seal a1)-a3(1—a : 
ts cpie [e1] be (2,2)(@1)-0, a1) aı. Note that every evaluation of texample has 


u = true, as there are always four calls to weight during evaluation. That is, the 
derivation requires suspension. However, many subderivations of texample do not 
require suspension. For example, the subderivations assume (Beta 2 2) and 
null obs do not (i.e., have u = false). Section 4 presents a suspension analysis 
that conservatively approximates which subderivations require suspension. The 
analysis enables, e.g., the selective CPS transformation in Fig. le. 


assume 


3.3 A-Normal Form 


We simplify the suspension analysis in Section 4 and the selective CPS transfor- 
mation in Section 5 by requiring that terms are in A-normal form (ANF) [13]. 


Definition 6 (A-normal form). We define the A-normal form terms tanr € 
Tanr as follows. 
tanp = £ | let x = thyf in tANF 
tanp = T£ | c | Az. tanp | £ y (3) 
| if x then tanp else tanf | assume x | weight x 


It holds that Tanp C T. Furthermore, there exist standard transformations to 
convert terms in T to Tanr. Fig. 3 illustrates Fig. la transformed to ANF. We 
will use Fig. la as a running example in Section 4 and Section 5. 
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Restricting programs to ANF significantly simplifies the suspension analysis 
and selective CPS transformation. From now on we require that all variable 
bindings in programs are unique, and together with ANF, the result is that 
every expression in a program t € T,anp is uniquely labeled by a variable name 
from a let expression. This property is essential for the treatment in Section 4. 


4 Suspension Analysis 


This section presents the main technical contribution: the suspension analysis. 
The analysis goal is to identify program expressions that may require suspension 
in the sense of Definition 5. Identifying such expressions leads to the selective 
CPS transformation in Section 5, enabling transformations such as in Fig le. 

The suspension analysis builds upon the 0-CFA algorithm [46,39], and we 
formalize our algorithms based on Lundén et al. [29]. The main challenge we 
solve is how to model the propagation of suspension in the presence of higher- 
order functions. The 0 in 0-CFA stands for context insensitivity—the analysis 
considers every part of the program in one global context. Context insensitivity 
makes the analysis more conservative compared to context-sensitive approaches 
such as k-CFA, where k € N indicates the level of context sensitivity [33]. We 
use 0-CFA for two reasons: (i) the worst-case time complexity for the analysis 
is polynomial, while it is exponential for k-CFA already at k = 1, and (ii) the 
limitations of 0-CFA rarely matter in practical PPL applications. For example, 
k-CFA provides no benefits over 0-CFA for the programs in Section 7. 

We assume (Az. t, p) ¢ C (recall that C is the set of intrinsics). That is, we 
assume that closures are not part of the intrinsics. In particular, this disallows 
intrinsic operations (including the use of assume d, d E€ D C C) to produce 
closures, which would needlessly complicate the analysis without any benefit. 

Consider the program in Fig. 3, and assume that weight requires suspension. 
Clearly, the expression labeled by w at line 20 then requires suspension. Fur- 
thermore, wı evaluates as part of the larger expression labeled by tg at line 10. 
Consequently, the evaluation of tg also requires suspension. Also, tg evaluates 
as part of an application of the abstraction binding obs at line 7. In particular, 
the abstraction binding obs binds to iter, and we apply iter at lines 23 and 33. 
Thus, the expressions named by tı and tg2 require suspension. In summary, 
we have that w1, tg, t17, and t22 require suspension, and we also note that all 
applications of the abstraction binding obs require suspension. 

We proceed to the formalization and first introduce standard abstract values. 


Definition 7 (Abstract values). We define the abstract values a € A as a= 
Ax.y | constyn forx,y €X andneN. 


The abstract value Ax.y represents all closures originating at, e.g., a term Ax. 
let y = 1 in yin a program at runtime (recall that we assume that the vari- 
ables x and y are unique). Note that the y indicates the name returned by the 
body (formalized by the function NAME in Algorithm 1). The abstract value 


Suspension Analysis and Selective CPS for Universal PPLs 313 


Algorithm 1 Constraint generation for the suspension analysis. We write the 
functional-style pseudocode for the algorithm itself in sans serif font to distin- 
guish it from terms in T. 

function GENERATECONSTRAINTS(t): Tant > P(R) = 


1 match t with 29 | weight _ > 
2|£z—> g 30 if suspend, eign, then {suspend,} else Ø 
3 | let x = tı inte > 31 | if y then t; else te > 
4 GENERATECONSTRAINTS(t2) U 32 GENERATECONSTRAINTS(t+) 
5 match tı with 33 U GENERATECONSTRAINTS(te ) 
6 | y > {Sy C So} 34 U {Syame te C Sa, Same te © Sx} 
7 | c > if |c| > 0 then {consta |c| € Sz} 35 U {suspend,, > suspend, 
8 else g 36 | n E€ suSPENDNAMES(t¢) 
9 | Ay. ty —> GENERATECONSTRAINTS(ty ) 37 U suSPENDNAMES(t, )} 
10 U {Ay. NAME ty € Sz} 38 
11 U {suspend,, > suspend , 39 function NAME(t): Tanr > X = 
12 | n E€ suSPENDNAMES(ty) } 40 match t with 
13 | lhs rhs > { 4. á |£>r 
14 VzYy Az.y E Sins 42 |let x = tı inte > NAME(t2) 
15 => (She C Sz) A (Sy C Sz), 43 
16 VyVn constyn E Shs AN > 1 44 function susPENDNAMES(t): Tanr > P(X) = 
17 => consty n — 1 € Sz, 45 match t with 
18 Vy Ay. €E Sins 46 |£> Øg 
19 > (suspend, => suspend,), 47 | let x= tı intz > 
20 Vy consty _ € Sins 48 SUSPENDNAMES(t2) U 
21 > (suspend, => suspend,), 49 match tı with 
22 suspend, > Re | ths rhs — {x} 
Se (Vy Ay. _ € Sins = suspend,) 51 | if y then t; else te > {x} 
24 A^ (Vy const E Sins > suspend, ) Be | SSS n 
ie g Y’ 53 if suSpendassume then {x} else Øg 
25 | } y 54 | weight _ —> 
26 assume ; 
27 if suspend came then {suspend} else Ø i iP ouap erdagi Mehl Ei 2 


56 |_>2 


const, n represents all intrinsic functions of arity n originating at x. For exam- 
ple, const, 2 originates at, e.g., a term let z = + in t. 


The central objects in the analysis are sets S, € P(A) and boolean values 
suspend for all x € X. The set S, contains all abstract values that may flow to 
the expression labeled by x, and suspend, indicates whether or not the expression 
requires suspension. A trivial but useless solution is Sy = A and suspend, = true 
for all variables x in the program. To get more precise information regarding 
suspension, we wish to find smaller solutions to the Sẹ and suspend,,. 


To formalize the set of sound solutions for S, and suspend „, we generate con- 
straints c € R for programs.’ Algorithm 1 formalizes the necessary constraints 
for programs t € Tanp with a function GENERATECONSTRAINTS that recursively 
traverses the program t to generate a set of constraints. Due to ANF, there are 
only two cases in the top match (line 1). Variables generate no constraints, and 
the important case is for let expressions at lines 3-30. The algorithm makes use 
of an auxiliary function NAME (line 39) that determines the name of an ANF 
expression, and a function SUSPENDNAMES (line 44) that determines the names 
of all top-level expressions within an expression that may suspend (namely, ap- 
plications, if expressions, and assume and/or weight). 
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We next illustrate and motivate the generated constraints by considering the 
set of constraints GENERATECONSTRAINTS(texample), Where texample is the pro- 
gram in Fig. 3. Many constraints are standard, and we therefore focus on the 
new suspension constraints introduced as part of this paper. In particular, the 
challenge is to correctly capture the flow of suspension requirements across func- 
tion applications and higher-order functions. First, we see that defining aliases 
(line 6) generates constraints of the form Sy C Ss, that constants introduce 
const abstract values (e.g., const;,1 E€ St), and that assume and weight intro- 
duce suspension requirements, e.g., suspend, (shorthand for suspend, = true). 

First, we consider the constraints generated for Aobs. (line 7 in Fig. 3) through 
the case at lines 9-12 in Algorithm 1. To keep the example simple, we treat the 
unexpanded let rec as an ordinary let in the analysis (for this particular 
example, the analysis result is unaffected). Omitting the recursively generated 
constraints for the abstraction body, the generated constraints are 


{Aobs. ts E€ Siter} U {suspend = suspend oys | n E {t7,tg}}. (4) 


The first constraint is standard and states that the abstract value Aobs. tg flows to 
Siter as the variable naming the Aobs expression is tg at line 26 in Fig. 3 (difficult 
to notice due to the column breaks). The remaining constraints are new and sets 
up the flow of suspension requirements. Specifically, the abstraction obs itself 
requires suspension if any expression bound by a top-level let in its body requires 
suspension. For efficiency, we only set up dependencies for expressions that may 
suspend (formalized by SUSPENDNAMES in Algorithm 1). Note here that we do 
not add the constraint suspend,,, = suspend ops, aS w: is not at top-level in the 
body of obs. Instead, we later add the constraint suspend,,, = suspend,,, and 
suspend,,, = suspend ops follows by transitivity. 

The constraints generated for the if bound to ts at line 10 through the case 
at lines 31-37 in Algorithm 1 are (omitting recursively generated constraints) 


{Sto C Sts, Stir C Sts} 
U {suspend,, = suspend, | n € {t11, t13, t14, W1, tie, t17}}- 


tg? 


(5) 


The first two constraints are standard, and state that abstract values in the 
results of both branches flow to the result St. The last set of constraints is new 
and similar to the abstraction suspension constraints. The constraints capture 
that all expressions at top-level in both branches that require suspension also 
cause tg to require suspension. 

Consider the application at line 23 in Fig. 3. The generated constraints 
through the case at lines 13-25 in Algorithm 1 are 


{ VeVy Az.y © Siter > (Sigg C Sz) A (Sy © Shr), 
VyVn constyn E Siter AN > 1 > constyn—1e Sny, 
Vy Ay. © Siter > (suspend, => suspend), 
Vy consty _ € Siter => (suspend, = suspend; ,), 


suspend, _ = (Vy Ay._ © Siter > suspend ,) 


tiz 


A (Vy consty _ © Siter => suspend,) }. 
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The first two constraints are standard and state how abstract values flow as a 
result of applications. The last three constraints are new and relate to suspension. 
The third and fourth constraints state that if an abstraction or intrinsic requiring 
suspension flows to iter, the result tı7 of the application also requires suspension. 
The fifth constraint states that if the result tı7 requires suspension, then all 
abstractions and constants flowing to iter require suspension. This last constraint 
is not strictly required to later prove the soundness of the analysis in Theorem 1, 
but, as we will see in Section 5, it is required for the selective CPS transformation. 

We find a solution to the constraints through a fairly standard algorithm that 
propagates abstract values according to the constraints until fixpoint.‘ However, 
we extend the algorithm to support the new suspension constraints. The al- 
gorithm is a function ANALYZESUSPEND: Tanr > ((X > P(A)) x P(X)). The 
function returns a map data : X — P(A) that assigns sets of abstract values to all 
Sz and a set suspend : P(X) that assigns suspend, = true iff x € suspend. Impor- 
tantly, the assignments to Sy and suspend, satisfy all generated constraints. To il- 
lustrate the algorithm, here are the analysis results ANALYZESUSPEND(texample): 


Siter = {Aobs.tg} Si, ={const,,1} Stio = {const;,,2} 
St, = {constz,,1} Sng ={const,,,1} Sts = {const,,.1} 
Sn = Ø | all other n € X (7) 
suspend „ = true | n € {0bs, wy, tg, t17, t22} 


suspend, = false | all other n € X. 


The above results confirm our earlier reasoning: the expressions labeled by obs, 
w1, tg, t17, and t22 may require suspension. 

We now consider the soundness of the analysis. First, the soundness of 0- 
CFA is well established (see, e.g., Nielson et al. [39]) and extends to our new 
constraints, and we take the following lemma to hold without proof. 


Lemma 1 (0-CFA soundness). For every t € Tanr, the solution given by 
ANALYZESUSPEND(t) for S, and suspend,, £x E X, satisfies the constraints 
GENERATECONSTRAINTS(t). 


Next, we must show that the constraints themselves are sound. Consider the 
evaluation of an arbitrary term t € Tang. For each subderivation of t, labeled by 
a name x (due to ANF), it must hold that suspend „ = true if the subderivation 
requires suspension. Otherwise, the analysis is unsound. Theorem 1 formally 
captures the soundness. Note that the analysis is conservative (i.e., incomplete), 
because it may find suspend, = true even if the subderivation for x does not 
require suspension. 


Theorem 1 (Suspension analysis soundness). Lett € Tanr, sE S, u€ 
{false, true}, w € R, andv € V such that Ø F t°} v. Now, let Sı and 
suspend, for x € X according to ANALYZESUSPEND(t). For every subderiva- 
tion (pF let x =t; in to salle avus V) of (Ø F t°} v), us = true implies 
suspend, = true. 
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Algorithm 2 Selective continuation-passing style transformation. We define 
tia = Av.x. The term ccps is the CPS version of c. We write the functional-style 
pseudocode for the algorithm itself in sans serif font to distinguish it from terms 
in T. 

function cps(vars, t): P(X) x Tanp > T? = 


1 return cps’ (tia, t) 28 | if y then t; else te > 
2 29 if x € vars then 
3 function cps’ (cont,t): T x Tanp > Tt = 30 if TAıLCALL(t) then 
4 match t with 31 if y then cps’ (cont, t+) 
5 |æ —> if cont = tig then t else cont t 32 else cPs' (cont, te) 
6 |let x= tı inte > 33 else 
7 let t4 = cps’ (cont, t2) in 34 let k = àz.th in 
8 match tı with 35 if y then cps’(k,t;) else cps’(k, te) 

n + 
9 | y > let x = tı int, 36 else let x = if y then cps’ (tia, t+) 
10 | c> Let r= 37 else CPS’ (tia, te) in th 
11 x (if x € vars then ccps else c) in ty ae |:asbume: y = det w =:ti in t 
12 | Ay. te on 39 if x € vars then 
13 let t} = if y € vars 40 if TAILCALL(t) 
14 then Ak. Ay. cps’ (k, ty) 41 then Suspension ssue(Y, cont) 
15 else Ay. cps’ (tia, to) 42 else Suspension, ssume(Y,A£.CPS' (cont, t2)) 
16 in 43 else let x = tı in t, 

tos + 
17 let x = t; int, 44 | weight y > let x = tı int) 
18 | lhs rhs > 45 if x € vars then 
19 if x € vars then 46 if TAILCALL(t) 
20 if TAILCALL(t) 47 then Suspension eight (Y: cont) 
21 then lhs cont rhs A 1 
> 48 else Suspension, jgne(y,Av-CPS' (cont, t2)) 

22 else Ihs (Ax.t4) rhs aa ae 

aa; 49 else let x = tı int) 
23 else let x = tı in t3 0 
a 51 function TAILCALL(t): Tanr — {false, true} = 
28 52 match t with 
26 53 |let z= ina — true 
ak 54 | _ — false 


The proof uses Lemma 1 and structural induction over the derivation Ø F} 
te v.t 
Next, we use the suspension analysis to selectively CPS transform programs. 


5 Selective CPS Transformation 


This section presents the second technical contribution: the selective CPS trans- 
formation. The transformations themselves are standard, and the challenge is to 
correctly use the suspension analysis results for a selective transformation. 
Algorithm 2 is the full algorithm. Using terms in ANF as input significantly 
helps reduce the algorithm’s complexity. The main function CPS takes as input 
a set vars : P(X), indicating which expressions to CPS transform, and a pro- 
gram t € Tanr to transform. It is the new vars argument that separates the 
transformation from a standard CPS transformation. For the purposes of this 
paper, we always use vars = {x | suspend, = true}, where the suspend, come 
from ANALYZESUSPEND(t). One could also use vars = X for a standard full CPS 
transformation (e.g., Fig 1f), or some other set vars for other application do- 
mains. The value returned from the CPs function is a (non-ANF) term of the 
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1 let tı = 2 in 17 let t13 = tı2 obs in 

2 let t2 = 2 in 18 let tia = tii tig in 

3 let t3 = Beta in 19 Suspension eight (14, 

4 let t4 = t3 tı in 20 AL 

5 let t5 = t4 tg in 21 let tı5 = tail in 

6 let a, = assume ts in 22 let tig = ti5 obs in 
7 let rec iter = Ak. Xobs. 23 iter k tie) 

8 let te = null in 24 in 

9 let t7 = tg obs in 25 let tıg = true in 

10 if tz then 26 let tig = false in 

11 let to = () in 27 let tao = true in 

12 tg 28 let t21 = true in 

13 else 29 let too = [ta1,teo,t19,tis] in 
14 a ae z poset in 30 let k’ = \_. ay in 

15 et ti, = tio a, in 5 1 

16 let tı2 = head in iater Enta 


Fig. 4: The running example from Fig. 3 after selective CPS transformation. The 
program is semantically equivalent to Fig. 1e. 


type T+. The helper function cps’, initially called at line 1, takes as input a 
continuation term cont, indicating the continuation to apply in tail position. Ini- 
tially, this continuation term is tią, which indicates no continuation. Similarly 
to Algorithm 1, the top-level match at line 4 has two cases: a simple case for 
variables (line 5) and a complex case for let expressions (lines 6-49). To enable 
optimization of tail calls, the auxiliary function TAILCALL indicates whether or 
not an ANF expression is a tail call (i.e., of the form let x = t’ in 2). 


We now illustrate Algorithm 2 by computing CPS(varSexample, texample),; Where 
ValSexample = { 008, W1, tg, t17, t22} is from (7), and texample is from Fig. 3. Fig. 4 
presents the final result. First, we note that the transformation does not change 
expressions not labeled by a name in varSexample, as they do not require suspen- 
sion. In the following, we therefore focus only on the transformed expressions. 
First, consider the abstraction obs defined at line 7 in Fig. 3, handled by the 
case at line 12 in Algorithm 2. As obs € varSexample, we apply the standard CPS 
transformation for abstractions: add a continuation parameter to the abstrac- 
tion and recursively transform the body with this continuation. Next, consider 
the transformation of the weight expression w; at line 20 in Fig. 3, handled by 
the case at line 44 in Algorithm 2. The expression is not at tail position, so we 
build a new continuation containing the subsequent let expressions, recursively 
transform the body of the continuation, and then wrap the end result in a Sus- 
pension object. The if expression tg at line 10 in Fig. 3, handled by the case 
at line 28 in Algorithm 2, is in tail position (it is directly followed by returning 
tg). Consequently, we transform both branches recursively. Finally, we have the 
applications tı7 and t22 at lines 23 and 33 in Fig. 3, handled by the case at 
line 18 in Algorithm 2. The application ¢17 is at tail position, and we transform 
it by adding the current continuation as an argument. The application at t22 is 
not at tail position, so we construct a continuation k’ that returns the final value 
a, (line 34 in Fig. 3), and then add it as an argument to the application. 


318 D. Lundén et al. 


P te Section 6.1; 


‘Section 6.2 ! 3 A ' 
! M Inference-Specific W4 Suspension Analysis ' 
i (je i =) Compilers ! 
CorePPL i Backend 
=> f [> | Executable 
Program i Compiler 
Inference-Specific j j 
Runtimes 


Fig. 5: Overview of the Miking CorePPL compiler implementation. We divide 
the overall compiler into two parts, (i) suspension analysis and selective CPS 
(Section 6.1) and (ii) inference problem extraction (Section 6.2). The figure de- 
picts artifacts as gray rectangular boxes and transformation units and libraries 
as blue rounded boxes. Note how the inference extractors transformation sep- 
arates the program into two different paths that are combined again after the 
inference-specific compilation. The white inheritance arrows (pointing to suspen- 
sion analysis and selective CPS transformations) mean that these libraries are 
used within the inference-specific compiler transformation. 


It is not guaranteed that Algorithm 2 produces a correct result. Specifically, 
for all applications lhs rhs, we must ensure that (i) if we CPS transform the ap- 
plication, we must also CPS transform all possible abstractions that can occur 
at lhs, and (ii) if we do not CPS transform the application, we must not CPS 
transform any abstraction that can occur at lhs. We control this through the 
argument vars. In particular, assigning vars according to the suspension analysis 
produces a correct result. To see this, consider the application constraints at 
lines 13-25 in Algorithm 1 again, and note that if any abstraction or intrinsic 
operation that requires suspension occur at lhs, suspend, = true. Furthermore, 
the last application constraint ensures that if suspend, = true, then all abstrac- 
tions and intrinsic operations that occur at lhs require suspension. Consequently, 
for all Ay. _ and consty _, either all suspend, = true or all suspend, = false. 


6 Implementation 


We implement the suspension analysis and selective CPS transformation in Mik- 
ing CorePPL [30], a core PPL implemented in the domain-specific language 
construction framework Miking [9]. We choose Miking CorePPL for the imple- 
mentation over other CPS-based PPLs, as the language implementation contains 
an existing 0-CFA base implementation which simplifies the suspension analysis 
implementation. Fig. 5 presents the organization of the CorePPL compiler. The 
input is a CorePPL program that may contain many inference problems and ap- 
plications of inference algorithms, similar to WebPPL and Anglican. The output 
is an executable produced by one of the Miking backend compilers. Section 6.1 
gives the details of the suspension analysis and selective CPS implementations, 
and in particular the differences compared to the core calculus in Section 3. Sec- 
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tion 6.2 presents the inference extractor and its operation combined with selec- 
tive CPS. The suspension analysis, selective CPS transformation, and inference 
extraction implementations consist of roughly 1500 lines of code (a contribution 
in this paper). The code is available on GitHub [2]. 


6.1 Suspension Analysis and Selective CPS 


Miking CorePPL extends the abstract syntax in Definition 1 with standard func- 
tional data structures and features such as algebraic data types (records, tuples, 
and variants), lists, and pattern matching. The suspension analysis and selective 
CPS implementations in Miking CorePPL extend Algorithm 1 and Algorithm 2 
to support these language features. Furthermore, compared to suspendyeignt and 
SUSPENG assume IN Fig. 2, the implementation allows arbitrary configuration of sus- 
pension sources. In particular, the implementation uses this arbitrary configura- 
tion together with the alignment analysis by Lundén et al. [29]. This combination 
allows selectively CPS transforming to suspend at a subset of assumes or weights 
for aligned versions of SMC and MCMC inference algorithms. 

Miking CorePPL also includes a framework for inference algorithm imple- 
mentation. Specifically, to implement new inference algorithms, users implement 
an inference-specific compiler and inference-specific runtime. Fig. 5 illustrates 
the different compilers and runtimes. Each inference-specific compiler applies 
the suspension analysis and selective CPS transformation to suit the inference 
algorithm’s particular suspension requirements. 

Next, we show how Miking CorePPL handles programs containing many 
inference problems solved with different inference algorithms. 


6.2 Inference Problem Extraction 


Fig. 5 includes the inference extraction compiler procedure. First, the compiler 
applies an inference extractor to the input program. The result is a set of infer- 
ence problems and a main program containing remaining glue code. Second, the 
compiler applies inference-specific compilers to each inference problem. Finally, 
the compiler combines the main program and the compiled inference problems 
with inference-specific runtimes and supplies the result to a backend compiler. 
Consider the example in Fig. 6a. We define a function m that constructs a 
minimal inference problem on lines 7-10, using a single call to assume and a 
single call to observe (modifying the execution weight similar to weight). The 
function takes an initial probability distribution d and a data point y as input. 
We apply aligned lightweight MCMC inference for the inference problem through 
the infer construct on lines 12-16. The first argument to infer gives the infer- 
ence algorithm configuration, and the second argument the inference problem. 
Inference problems are thunks (i.e., functions with a dummy unit argument). 
We construct the inference problem thunk by an application of m with a uniform 
initial distribution and data point 1.0. The inference result dO is another proba- 
bility distribution, and we use it as the first initial distribution in the recursive 
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1 mexpr 18 recursive let repeat = 

2 let data = [ 19 lam data. lam d. 

3 24.0, 42.2, 96.7, 9.2, 85.8, 20 match data with [y] ++ data then 
4 34.2, 41.7, 53.4, 85.6, 45.4 21 let posterior = 

5] in 22 infer (BPF {particles = 100}) 
6 23 (m d y) in 

7 let m = lam d. lam y. lam. 24 repeat data posterior 

8 let x = assume d in 25 else d 

9 observe y (Gaussian x 0.1); 26 let d1 = repeat data d0 in 

10 x in 27 match distEmpiricalSamples di 

11 28 with (samples, weights) in 

12 let d0 = 29 iter 

13 infer (LightweightMCMC 30 (lam s. 

14 { iterations = 100, 31 print 

15 aligned = true }) 32 (concat (float2string s) "\n")) 
16 (m (Uniform 0.0 4.0) 1.0) in 33 samples 


(a) Miking CorePPL program. 


1 let m = lam d. lam y. lam. 1 let m = lam d. lam y. lam. 

2 let x = assume d in 2 let x = assume d in 

3 observe y (Gaussian x 0.1); 3 observe y (Gaussian x 0.1); 

4 xin 4 xin 

5m (Uniform 0.0 4.0) 1.0 () smdy O 
(b) Extracted inference problem from (c) Extracted inference problem from 
line 13 in (a). line 22 in (a). 


Fig.6: Example Miking CorePPL program in (a) with two non-trivial uses of 
infer. Figures (b) and (c) show the extracted and selectively CPS-transformed 
inference problems at lines 13 and 22 in (a), respectively. The compiler handles 
the free variables d and y in (c) in a later stage. 


repeat function (lines 19-24). This function repeatedly performs inference us- 
ing the SMC bootstrap particle filter (lines 21-23), again using the function m 
to construct the sequence of inference problems. Each infer application uses 
the result distribution from the previous iteration as the initial distribution and 
consumes data points from the data sequence. We extract and print the samples 
from the final result distribution d1 at lines 29-33. A limitation with the current 
extraction approach is that we do not yet support nested infers. 


A key challenge in the compiler design is how to handle different inference 
algorithms within one probabilistic program. In particular, inference algorithms 
require different selective CPS transformations, applied to different parts of the 
code. To allow the separate handling of inference algorithms, we apply the ex- 
traction approach by Hummelgren et al. [22] on the infer applications, pro- 
ducing separate inference problems for each occurrence of infer. Although the 
compiler design mostly concerns rather comprehensive engineering work, special 
care must be taken to handle the non-trivial problem of name bindings when 
transforming and combining different code entities together. For instance, the 
compiler must selectively CPS transform Fig. 6b to suspend at assume (required 
by MCMC) and selectively CPS transform Fig. 6c to suspend at observe (re- 
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quired by SMC). We design a robust and modular solution, where it is possible 
to easily add new inference algorithms without worrying about name conflicts. 


7 Evaluation 


This section presents the evaluation of the suspension analysis and selective CPS 
implementations. Our main claims are that (i) the approach of selective CPS sig- 
nificantly improves performance compared to traditional full CPS, and (ii) that 
this holds for a significant set of inference algorithms, evaluated on realistic infer- 
ence problems. We use four PPL models and corresponding data sets from the 
Miking benchmarks repository, available on GitHub [1]. The models are: con- 
stant rate birth-death (CRBD) in Section 7.1, cladogenetic diversification rate 
shift (ClaDS) in Section 7.2, latent Dirichlet allocation (LDA) in Section 7.3, 
and vector-borne disease (VBD) in Section 7.4. All models are significant and 
actively used in different research areas: CRBD and ClaDS in evolutionary bi- 
ology and phylogenetics [37,43,32], LDA in topic modeling [7], and VBD in epi- 
demiology [14,34]. In addition to the Miking CorePPL models from the Miking 
benchmarks, we also implement CRBD in WebPPL and Anglican. 

We add a number of popular inference algorithms in Miking CorePPL with 
support for selective CPS. The first is standard likelihood weighting (LW), as 
introduced in Section 2. LW does not strictly require CPS, but we implement 
it with suspensions at weight to highlight the difference between no CPS, se- 
lective CPS, and full CPS. LW gives a good direct measure of CPS overhead 
as the algorithm simply executes programs many times. Suspending at weight 
can also be useful in LW to stop executions with weight 0 (i.e., useless samples) 
early. However, we do not use early stopping to isolate the effect CPS has on 
execution time. Next, we add the bootstrap particle filter (BPF) and alive parti- 
cle filter (APF). Both are SMC algorithms that suspend at weight to resample 
executions. BPF is a standard algorithm often used in PPLs, and APF is a re- 
lated algorithm introduced in a PPL context by Kudlicka et al. [24]. The final 
two inference algorithms we add are aligned lightweight MCMC (just MCMC 
for short) and particle-independent Metropolis—Hastings. Aligned lightweight 
MCMC [29] is an extension to the standard PPL Metropolis—Hastings approach 
introduced by Wingate et al. [49], and suspends at a subset of calls to assume. 
Particle-independent Metropolis—Hastings (PIMH) is an MCMC algorithm that 
repeatedly uses the BPF (suspending at weight) within a Metropolis—Hastings 
MCMC algorithm [40]. We limit the scope to single-core CPU inference. 

In addition to the inference algorithms in Miking CorePPL, we also use three 
other state-of-the-art PPLs for CRBD: Anglican, WebPPL, and the special high- 
performance RootPPL compiler for Miking CorePPL [30]. For Anglican, we ap- 
ply LW, BPF, and PIMH inference. For WebPPL, we use BPF and (non-aligned) 
lightweight MCMC. For the RootPPL version of Miking CorePPL, we use BPF 
inference (the only supported inference algorithm). 

We consider two configurations for each model: 1000 and 10000 samples. An 
exception is for CRBD and ClaDS, where we adjust APF to use 500 and 5000 
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Execution Time for CRBD (500-1000 Samples) Execution Time for CRBD (5000-10000 Samples) 
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RootPPL RootPPL 


| Anglican LW Anglican BPF WebPPL BPF WebPPL MCMC 
11.6 + 0.36s 5.65 + 2.71s 2.42 + 0.20 s 1.42 + 0.07 s 
90.4 + 2.12s 29.1 + 2.35s 53.9 + 4.03 s 3.10 + 0.77 s 


1000 samples 
10000 samples 


Fig. 7: Mean execution times for the CRBD model. The error bars show 95% 
confidence intervals (using the option (’ci’, 95) in Seaborn’s barplot). The 
table shows standard deviations. 


samples to make the inference accuracy comparable to the related BPF. We run 
each experiment 300 times (with one warmup run) and measure execution time 
(excluding compile time). To justify the efficiency of the suspension analysis and 
selective CPS transformation that are part of the compiler, we note here that 
they, combined, run in only 1-5 ms for all models. 

The experiments do not compare the performance of different inference algo- 
rithms. To do this, one would also need to consider how accurate the inference 
results are for a given amount of execution time. Accuracy varies dramatically 
between different combinations of inference algorithms and models. We evaluate 
the execution time of selective and full CPS in isolation for individual infer- 
ence algorithms. Selective CPS is solely an execution time optimization—the 
algorithms themselves and their accuracy remain unchanged.t 

For Miking CorePPL, we used OCaml 4.12.0 as backend compiler for the 
implementation in Section 6 and GCC 7.5.0 for the separate RootPPL com- 
piler. We used Anglican 1.1.0 (OpenJDK 11.0.19) and WebPPL 0.9.15 (Node.js 
16.18.0). We ran the experiments on an Intel Xeon Gold 6148 CPU with 64 GB 
of memory using Ubuntu 18.04.6. 


7.1 Constant Rate Birth-Death 


CRBD is a diversification model, used by evolutionary biologists to infer distri- 
butions over birth and death rates for observed evolutionary trees of groups of 
species, called phylogenies. For the CRBD experiment, we use the Alcedinidae 
phylogeny (Kingfisher birds, 54 extant species) [43,23]. We compare CRBD in 
Miking CorePPL (55 lines of code)’, Anglican (129 lines of code)’, and WebPPL 
(66 lines of code)’. The total experiment execution time was 9 hours. 
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Fig. 8: Mean execution times for the ClaDS model. The error bars show 95% 
confidence intervals (using the option (’ci’, 95) in Seaborn’s barplot). 


Fig. 7 presents the results. We note that selective CPS is faster than full CPS 
in all cases. Unlike full CPS, the overhead of selective CPS compared to no CPS 
is marginal for LW. The execution time for early MCMC samples is sensitive 
to initial conditions, and we therefore see more variance for MCMC compared 
to the other algorithms. When we increase the number of samples to 10000, 
the variance reduces. With the exception of MCMC in WebPPL, the execution 
times for Anglican and WebPPL are one order of magnitude slower than the 
equivalent algorithms in Miking CorePPL. However, note that the comparison 
is only for reference and not entirely fair, as Anglican and WebPPL use different 
execution environments compared to Miking CorePPL. Lastly, we note that the 
Miking CorePPL BPF implementation with selective CPS is not much slower 
than when compiling Miking CorePPL to RootPPL BPF—a compiler designed 
specifically for efficiency (but with other limitations, such as the lack of garbage 
collection). RootPPL does not use CPS, and instead enables suspension through 
a low-level transformation using the concept of PPL control-flow graphs [30]. 


7.2 Cladogenetic Diversification Rate Shift 


ClaDS is another diversification model used in evolutionary biology [32,43]. Un- 
like CRBD, it allows birth and death rates to change over time. We again use the 
Alcedinidae phylogeny. The source code consists of 72 lines of code.* The total 
experiment execution time was 3 hours. Fig. 8 presents the results. We note that 
selective CPS is faster than full CPS in all cases. 


7.3 Latent Dirichlet Allocation 


LDA [7| is a model from natural language processing used to categorize docu- 
ments into topics. We use a synthetic data set with size comparable to the data 
set in Ritchie et al. [41]: a vocabulary of 100 words, 10 topics, and 25 observed 
documents (30 words in each). We do not apply any optimization techniques such 
as collapsed Gibbs sampling [21]. Solving the inference problem using a PPL is 
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Fig. 9: Mean execution times for the LDA model. The error bars show 95% 
confidence intervals (using the option (?ci’, 95) in Seaborn’s barplot). 
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Fig. 10: Mean execution times for the VBD model. The error bars show 95% 
confidence intervals (using the option (’ci’, 95) in Seaborn’s barplot). 


therefore challenging already for small data sets. The source code consists of 26 
lines of code.t The total experiment execution time was 12 hours. 

Fig. 9 presents the results. We note that selective CPS is faster than full CPS 
in all cases. Interestingly, the reduction in overhead compared to full CPS for 
LW is not as significant. The reason is that suspension at weight for the model 
requires that we CPS transform the most computationally expensive recursion. 


7.4 Vector-Borne Disease 


We use the VBD model from Funk et al. [14] and later Murray et al. [34]. The 
background is a dengue outbreak in Micronesia and the spread of disease between 
mosquitos and humans. The inference problem is to find the true numbers of 
susceptible, exposed, infectious, and recovered (SEIR) individuals each day, given 
daily reported numbers of new cases at health centers. The source code consists 
of 140 lines of code. The total execution time was 8 hours. 

Fig. 10 presents the results. Again, we note that selective CPS is faster than 
full CPS in all cases, except seemingly for APF and 1000 samples. This is very 
likely a statistical anomaly, as the variance for APF is quite severe for the case 
with 1000 samples. Compared to the BPF, APF uses a resampling approach for 
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which the execution time varies a lot if the number of samples is too low [24]. 
The plots clearly show this as, compared to 1000 samples, the variance is re- 
duced to BPF-comparable levels for 10000 samples. In summary, the evaluation 
demonstrates the clear benefits of selective CPS over full CPS for universal PPLs. 


8 Related Work 


There are a number of universal PPLs that require non-trivial suspension. One 
such language is Anglican [50], which solves the suspension problem using CPS. 
Anglican performs a full CPS transformation with one exception—certain stat- 
ically known functions named primitive procedures, that include a subset of the 
regular Clojure (the host language of Anglican) functions, are guaranteed to not 
execute PPL code, and Anglican does not CPS transform them [47]. However, 
higher-order functions in Clojure libraries cannot be primitive procedures, and 
Anglican must manually reimplement such functions (e.g., map and fold). An- 
glican does not consider a selective CPS transformation of PPL code, and always 
fully CPS transforms the PPL part of Anglican programs. 

WebPPL [18] and the approach by Ritchie et al. [41] also make use of CPS 
transformations to implement PPL inference. They do not, however, consider 
selective CPS transformations. Scibior et al. [44] present an architectural design 
for a probabilistic functional programming library based on monads and monad 
transformers (and corresponding theory in Scibior et al. [45]). In particular, they 
use a coroutine monad transformer to suspend SMC inference. This approach is 
similar to ours in that it makes use of high-level functional language features to 
enable suspension. They do not, however, consider a selective transformation. 

The PPLs Pyro [6], Stan [10,5], Gen [11,27], and Edward [48] either im- 
plement inference algorithms that do not require suspension (e.g., Hamiltonian 
Monte Carlo), or restrict the language in such a way that suspension is explicit 
and trivially handled by the language implementation. For example, SMC in 
Pyro® and newer versions of Birch require that users explicitly write programs 
as a step function that the SMC implementation calls iteratively. Resampling 
only occurs in between calls to step, and suspension is therefore trivial. 

Work on general-purpose selective CPS transformations include Nielsen [38], 
Asai and Uehara [4], Rompf et al. [42], and Leijen [26]. They consider typed lan- 
guages, unlike the untyped language in this paper. The early work by Nielsen [38] 
considers the efficient implementation of call/cc through a selective CPS trans- 
formation. The transformation requires manual user annotations, unlike the fully 
automatic approach in this paper. A more recent approach is due to Asai and Ue- 
hara [4], who consider an efficient implementation of delimited continuations us- 
ing shift and reset through a selective CPS transformation. Similar to us, they 
automatically determine where to selectively CPS transform programs. They use 
an approach based on type inference, while our approach builds upon 0-CFA. 
Rompf et al. [42] follow a similar approach to Asai and Uehara [4], but for 


8 Note that the main inference algorithm in Pyro is stochastic variational inference, 
which does not require suspension. 
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Scala, and additionally require user annotations. Leijen [26] uses a type-directed 
selective CPS transformation to compile algebraic effect handlers. 

There are low-level alternatives to CPS for suspension in PPLs. In particular, 
there are various languages and approaches that directly implement support for 
non-preemptive multitasking (e.g., coroutines). Turing [15] and older versions of 
Birch [36,35] implement coroutines to enable arbitrary suspension, but do not 
discuss the implementations in detail. Lundén et al. [30] introduces and uses the 
concept of PPL control-flow graphs to compile Miking CorePPL to the low-level 
C++ framework RootPPL. The compiler explicitly introduces code that main- 
tains special execution call stacks, distinct from the implicit C++ call stacks. The 
implementation results in excellent performance, but supports neither garbage 
collection nor higher-order functions. Another low-level approach is due to Paige 
and Wood [40], who exploits mutual exclusion locks and the fork system call to 
suspend and resample SMC executions. In theory, many of the above low-level 
alternatives to CPS can, if implemented efficiently, result in the least possible 
overhead due to more fine-grained low-level control. The approaches do, however, 
require significantly more implementation effort compared to a CPS transforma- 
tion. Comparatively, the selective CPS transformation is a surprisingly simple, 
high-level, and easy-to-implement alternative that brings the overhead of CPS 
closer to that of more low-level approaches. 


9 Conclusion 


This paper introduces a selective CPS transformation for the purpose of exe- 
cution suspension in PPLs. To enable the transformation, we develop a static 
suspension analysis that determines parts of programs that require a CPS trans- 
formation as a consequence of inference algorithm suspension requirements. We 
implement the suspension analysis, selective CPS transformation, and an infer- 
ence problem extraction procedure (required as a result of the selective CPS 
transformation) in Miking CorePPL. Furthermore, we evaluate the implementa- 
tion on real-world models from phylogenetics, topic-modeling, and epidemiology. 
The results demonstrate significant speedups compared to the standard full CPS 
suspension approach for a large number of Monte Carlo inference algorithms. 
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Abstract Logically constrained term rewriting systems (LCTRSs) are a 
formalism for program analysis with support for data types that are not 
(co)inductively defined. Only imperative programs have been considered 
through the lens of LCTRSs so far since LCTRSs were introduced as a first- 
order formalism. In this paper, we propose logically constrained simply- 
typed term rewriting systems (LCSTRSs), a higher-order generalization 
of LCTRSs, which suits the needs of representing and analyzing functional 
programs. We also study the termination problem of LCSTRSs and define 
a variant of the higher-order recursive path ordering (HORPO) for the 
newly proposed formalism. 


Keywords: Higher-order term rewriting - Constraints - Recursive path 
ordering. 


1 Introduction 


It is hardly a surprising idea that term rewriting can serve as a vehicle for 
reasoning about programs. During the last decade or so, the term rewriting 
community has seen a line of work that translates real-world problems from 
program analysis into questions about term rewriting systems, which include, for 
example, termination (see, e.g., [8,10,15,37]) and equivalence (see, e.g., [13,36,9]). 
Such applications take place across programming paradigms due to the versatile 
nature of term rewriting, and often materialize into automatable solutions. 

Data types are a central building block of programs and must be properly 
handled in program analysis. While it is rarely a problem for term rewriting 
systems to represent (co)inductively defined data types, others such as integers 
and arrays traditionally require encoding; think of neg (suc (suc (suc zero))) 
encoding —3. This usually turns out to cause more obfuscation than clarification 
to the methods applied and the results obtained. An alternative is to incorporate 
primitive data types into the formalism, which contributes to the proliferation of 
subtly different formalisms that are generally incompatible with each other, and 
it is often difficult to transfer techniques between such formalisms. 

Logically constrained term rewriting systems (LCTRSs) [27,12] emerged from 
this proliferation as a unifying formalism seeking to be general in both the 
selection of primitive data types (little is presumed) and the applicability of 
varied methods (many are extensible). LCTRSs thus allow us to benefit from the 
broad term rewriting arsenal in a wide range of scenarios for program analysis 
© The Author(s) 2024 
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(see, e.g., [32,24,23]). In particular, rewriting induction on LCTRSs [12,30] offers 
a powerful tool for program verification. 

As a first-order formalism, LCTRSs only naturally accommodate imperative 
programs. This paper aims to generalize this formalism in a higher-order setting. 


Motivation. Below is a first-order LCTRS implementing the factorial function: 
factn 31 [n<0] fact n > nx fact (n— 1) [n> 0] 


where n <0 and n > 0 are logical constraints, which the integer n must satisfy 
respectively when the corresponding rewrite rule is applied. Suppose we have 
access to higher-order functions, a defining feature of functional programming; 
now we have the following alternative implementation of fact: 


fact n — fold (*) 1 (genlist n) 


genlist n > nil [n < 0] genlist n — cons n (genlist (n — 1)) [n > 0] 
fold f y nil > y fold f y (cons a l) > f x (fold f y 1) 


Here fold takes an argument f, which itself represents a function. Higher-order 
functions such as fold do not fit into first-order LCTRSs, which leads to the first 
reason to generalize this formalism: to overcome the limitation of its expressivity. 

There is another reason for higher-order LCTRSs. The latter implementation 
of fact reflects a pattern of functional programming: the combination of “standard” 
higher-order building blocks such as fold and map, and functions that are specific 
to the problem at hand. We note that a higher-order formalism can reveal more 
modularity in programs. It would be valuable to exploit such modularity in 
analyses as well. 

With higher-order LCTRSs, we would like to explore automatable solutions 
to the termination problem of functional programs in the same fashion as the 
first-order case [27,25], or even better, to the finding of their complexity by 
term rewriting. Moreover, given two programs supposedly implementing the 
same function, a method that derives whether they are indeed equivalent is 
also desirable. For example, a proof that the above two implementations of 
fact are equivalent may serve as a correctness proof of the latter, less intuitive 
implementation (which in general might be an outcome of code refactoring). Such 
methods have been explored in a first-order setting [12,7]. 

Higher-order LCTRSs will broaden the horizons of both LCTRSs and higher- 
order term rewriting. The eventual goal is to have a formalism that can be 
deployed to analyze both imperative and functional programs, so that through 
this formalism, the abundant techniques based on term rewriting may be applied 
to automatic program analysis. This paper is a step toward that goal. 


Contributions. The presentation begins with our perspective on higher-order 
term rewriting (without logical constraints) in Section 2. The contributions of 
this paper follow in subsequent sections: 


Higher-Order LCTRSs and Their Termination 333 


— We propose the formalism of logically constrained simply-typed term rewriting 
systems (LCSTRSs), a higher-order generalization of LCTRSs, in Section 3. 

— We adapt reduction orderings and rule removal to the newly proposed formal- 
ism, and define (as well as prove the soundness of) constrained HORPO—a 
variant of HORPO [21]—in Section 4. This includes changes to fit HORPO to 
curried notation and partial application, and to handle theory symbols and 
logical constraints in a similar way to RPO for first-order LCTRSs [27]. While 
this version of HORPO is not the most powerful higher-order termination 
technique, it offers a simple yet self-contained solution, and serves to illustrate 
how existing techniques may be extended. 

— We have developed for our formalism the foundation of a new open-source 
analysis tool, in which an implementation of constrained HORPO is provided. 
It requires several new insights, especially with regard to the way theories 
and logical constraints are handled, and is discussed in Section 6. 


2 Preliminaries 


One of the first problems that a student of higher-order term rewriting faces is 
the absence of a standard formalism on which the literature agrees. This variety 
reflects the diverse interests and needs held by different authors. 

In this section, we present simply-typed term rewriting systems (STRSs) [29] 
as the unconstrained basis of our formalism. This is one of the simplest higher- 
order formalisms, and closely resembles simple functional programs. We choose 
this formalism as our starting point because it is already powerful, while avoiding 
many of the complications that may be interesting for equational reasoning 
purposes but are not needed in program analysis, such as reduction modulo æ. 


Types and Terms. Types rule out undesired terms. We consider simple types: 
given a non-empty set S of sorts (or base types), the set T of simple types 
over S is generated by the grammar T ::= S | (T > T). Right-associativity is 
assigned to — so we can omit some parentheses. The order of a type A, denoted 
by ord(A), is defined as follows: ord(A) = 0 for A € S and ord(A > B) = 
max(ord(A) + 1, ord(B)). 

Given disjoint sets F and VY, whose elements we call function symbols and 
variables, respectively, the set T of pre-terms over F and VY is generated by the 
grammar F := F | V | (T T). Left-associativity is assigned to the juxtaposition 
operation, called application, so to tı t2 stands for ((to tı) t2), for example. 

We assume that every function symbol and variable is assigned a unique 
type. Typing works as expected: if pre-terms tp and tı have types A > B and 
A, respectively, to tı has type B. The set of terms over F and V, denoted by 
T(F, V), is the subset of T consisting of pre-terms with a type. We write t: A if a 
term t has type A. The set of variables occurring in a term t € T(F, V), denoted 
by Var(t), is defined as follows: Var(f) = @ for f € F, Var(x) = {x} for x € V 
and Var(to t1) = Var(to) U Var(t,). A term t is called ground if Var(t) = Ø. The 
set of ground terms over F is denoted by T(F,9). 
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Substitutions and Contexts. Variables occurring in a term can be seen as place- 
holders: the occurrences of a variable may be replaced with terms which have the 
same type as the variable does. Type-preserving mappings from V to T(F, V) are 
called substitutions. Every substitution o extends to a type-preserving mapping 
g from T(F,V) to T(F, V). We write to for a(t) and define it as follows: fo = f 
for f € F, xo = o(x) for x € V and (to t1)o = (too) (tic). 

Term formation gives rise to the concept of a context: a term containing a hole. 
Formally, let O be a special terminal symbol denoting the hole, and the grammar 
€ := O | (€ T) | (T €) with the above rule for Z generates pre-terms containing 
exactly one occurrence of the hole. Given a type for the hole, a context is an 
element of € which is typed as a term is. Let C[]4 denote a context in which the 
hole has type A; filling the hole with a term t : A produces the term C |t] 4 defined 
as follows: Oft]4 = t, (Colla ti)[t]a = Coltla ti and (to Cilla)[t]a = to Ciltli- 
We usually omit types in the above notation, and in C|f], t is understood as a 
term which has the same type as the hole does. 


Rules and Rewriting. Now we have all the ingredients in our recipe for higher- 
order term rewriting. A rewrite rule €— r is an ordered pair of terms where ¢ 
and r have the same type, Var(/) D Var(r) and £ assumes the form f tı -tn for 
some function symbol f. Formally, a simply-typed term rewriting system (STRS) 
is a quadruple (S, F, V, R) where every element of F U V is assigned a simple 
type over S and R C T(F, V) x T(F,V) is a set of rewrite rules. We usually let 
R alone stand for the system and keep the details of term formation implicit. 

The set R of rewrite rules induces the rewrite relation >r C T(F,V) x 
T(F,V):t >p t if and only if there exist a rewrite rule 0 > r € R, a substitution 
c and a context C|] such that t = Cléo] and t = C[ro]. When there is no 
ambiguity about the system in question, we may simply write — for >R. 

Given a relation > C X x X, an element x of X is called terminating with 
respect to > if there is no infinite sequence x = xp > zı > ---, and > is called 
well-founded if all the elements of X are terminating with respect to >. An STRS 
R is called terminating if + is well-founded. 


Example 1. The following rewrite rules constitute a terminating system: 

take zero l > nil take n nil— nil take (suc n) (cons x l) —> cons x (take n l) 
where zero : nat, suc : nat — nat, nil : natlist, cons : nat > natlist > natlist and 
take : nat — natlist > natlist are function symbols, and l : natlist, n : nat and 
x : nat are variables. 

Example 2. The following rewrite rule constitutes a non-terminating system: 


iterate f x — cons x (iterate f (f x)) 


where cons : nat + natlist — natlist and iterate : (nat — nat) + nat — natlist are 
function symbols, and f : nat > nat and x: nat are variables. 
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Limitations. The above formalism does not offer product types, polymorphism 
or A-abstractions. What it does offer is its already expressive syntax enabling us, 
in a higher-order setting, to generalize LCTRSs and to discover what challenges 
one may face when extending existing unconstrained techniques. We expect that, 
once preliminary higher-order results are developed, we will adopt more features 
from other higher-order formalisms in future extensions. 

The exclusion of A-abstractions does not rid us of first-class functions, thanks 
to curried notation. For example, the occurrence of suc in iterate suc zero is 
partially (in this case, not at all) applied and still forms a term, which can be 
passed as an argument. Also, a term such as iterate (Aw. suc (suc x)) zero can be 
simulated at the cost of an extra rewrite rule (in this case, add2 x — suc (suc 2)). 
There are also straightforward ways of encoding product types. 


Notions of Termination. If we combine the two systems from Examples 1 
and 2, the outcome is surely non-terminating: take zero (iterate suc zero) is not 
terminating, for example. From a Haskell programmer’s perspective, however, 
this term is “terminating” due to the non-strictness of Haskell. In general, every 
functional language uses a certain evaluation strategy to choose a specific redex, 
if any, to rewrite within a term, whereas the rewrite relation we define in this 
section corresponds to full rewriting: the redex is chosen non-deterministically. 
Furthermore, programmers usually care only about the termination of terms 
that are reachable from the entry point of a program and seldom consider full 
termination: the termination of all terms, i.e., the well-foundedness of the rewrite 
relation. We study full termination with respect to full rewriting in this paper, 
as it implies any other termination properties and full termination is often a 
prerequisite for determining properties such as confluence and equivalence. 


3 Logically Constrained STRSs 


Term rewriting systems do not have primitive data types built in; with some 
function symbols constructing (introducing) values of a certain type and pattern 
matching rules deconstructing (eliminating) those values, a term rewriting system 
relies on (co)inductively defined data types. While (co)inductive reasoning is 
straightforward this way, data types such as integers and arrays require encoding, 
which can be convoluted; think of the space-consuming unary representation of a 
number or a binary representation which takes less space but shifts the burden 
to rewrite rules defining arithmetic, and negative numbers bring up even more 
complications. Besides, such encoding neglects advances in modern SMT solvers. 

In this section, we extend unconstrained STRSs with logical constraints so 
that data types that are not (co)inductively defined can be represented directly, 
and analysis tools can take advantage of existing SMT solvers. We follow the 
ideas of first-order LCTRSs [27,12]. Specifically, we will consider systems over 
arbitrary first-order theories, i.e., we are not bound to, say, systems over integers, 
while avoiding higher-order logical constraints. In the unconstrained part of such 
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a system (outside theories), however, higher-order arguments and results are still 
completely usable. 


3.1 Terms Modulo Theories 


Following Section 2, we postulate a set S of sorts, a set F of function symbols 
and a set V of variables where every element of F U V is assigned a simple type 
over S. First, we assume that there is a distinguished subset Sy of S, called the 
set of theory sorts. The grammar Ty = So | (So —> Ta) generates the set Ty of 
theory types over Sy. Note that the order of a theory type is never greater than 
one. Next, we assume that there is a distinguished subset Fy of F, called the 
set of theory symbols, and that the type of every theory symbol is in Ty, which 
means that the type of any argument passed to a theory symbol is a theory sort. 
Elements of T(Fy»,V) are called theory terms. Last, for technical reasons, we 
assume that there are infinitely many variables of each type. 

Theory symbols are interpreted in an underlying theory: given an Sy-indexed 
family of sets (X4)aes,, we extend it to a Ty-indexed family by letting ¥4— B 
be the set of mappings from X,4 to Xg; an interpretation of theory symbols is 
a To-indexed family of mappings ([-]4)ae7, where [-]4 assigns to each theory 
symbol of type A an element of X,4 and is bijective! if A € Sy. Theory symbols 
whose type is a theory sort are called values. Given an interpretation of theory 
symbols ([-].4) 4e7); we extend each indexed mapping [-]z to one that assigns 
to each ground theory term of type B an element of Xzg by letting [to ti] be 
[to] a+e([ti]4). We usually write just [-] when the type can be deduced. 


Example 3. Let Sy be {int }. Then int > int — int is a theory type over Sy while 
(int > int) > int is not. Let Fy be { sub }U{ ñ |n € Z} where sub : int > int — int 
and ñ : int. The values are the elements of {7|n € Z}. Let Xim be Z, [Jint be 
the mapping 7 ++ n and [sub] be the mapping Am. An. m — n. The interpretation 
of sub 1 is the mapping An. 1— n. 


We are not limited to the theory of integers: 


Example 4. To reason about integer arrays, we could either represent them 
as lists and simulate random access through more costly list traversal (which 
affects the complexity), or consider a theory of bounded arrays as follows: Let 
Sy be { int, intarray } and Fy be the union of { size, select, store}, {m|n € Z} 
and { (7i9,..., 7-1) |k E€ N and Vi. n; € Z} where size : intarray — int, select : 
intarray — int — int, store : intarray —> int — int — intarray, m : int and 
(fio,..-,Me—-1) : intarray. Let Xint and Xintarray be Z and Z*, respectively. Let [-]int 
be the mapping ñ — n and [-Jintarray be the mapping (Mo, ...,724~-1) = No - - -Nk—1- 
Let [size](no...m%-1) be k. Let [select](m9...nx-1,7) be n; if 0 < i < k, and 
0 otherwise. Let [store] (no... npg-1,i, M) be no...ni-imniqi...ng-1 if 0 < 
i < k, and no...ng-ı otherwise. Note that the values include theory symbols 
(ño, ..., ñk—1) : intarray as well as ^ñ : int. 


1 The bijectivity is assumed so that values (see below) are isomorphic to (and therefore 
a representation of) elements of (4) secs, - 


Higher-Order LCTRSs and Their Termination 337 


In this paper, we largely consider the theory of integers in Example 3 when 
giving examples because it is easy to understand. This particular theory does not 
play a special role for the formalism we will shortly present; in fact, the theory of 
bit vectors may be more appropriate to real-world programs using integers, and 
our formalism is not biased toward any choice of theories. In particular, we do 
not have to choose predefined theories from SMT-LIB [3]. The theory of bounded 
arrays in Example 4 is an instance of such a “non-standard” theory (which can 
nevertheless be encoded within the theory of functional arrays). On the other 
hand, theories supported by SMT solvers are preferable in light of automation. 


3.2 Constrained Rewriting 


Constrained rewriting requires the theory sort bool: we henceforth assume that 
bool € Sy, {f,t} C Fe, Xboo = {0,1}, [f]boor = O and [boo = 1. A logical 
constraint is a theory term y such that y has type bool and the type of each 
variable in Var(y) is a theory sort. A (constrained) rewrite rule is a triple £ > r [yp] 
where £ and r are terms which have the same type, y is a logical constraint, 
the type of each variable in Var(r) \ Var(¢) is a theory sort and @ is a term that 
assumes the form f t,---t, for some function symbol f and contains at least 
one function symbol in F \ Fy.? 

This definition can be obscure at first glance, especially when compared with 
its unconstrained counterpart in Section 2: variables which do not occur in @ are 
allowed to occur in r, not to mention the logical constraint y as a brand-new 
component. Given a rewrite rule ¢ > r [y], the idea is that variables occurring in 
y are to be instantiated to values which make ọ true and other variables which 
occur in r but not in @ are to be instantiated to arbitrary values—note that the 
type of each of these variables is a theory sort. Formally, given an interpretation 
of theory symbols [-], a substitution ø is said to respect a rewrite rule 4 > r [y] 
if o(x) is a value for all x € Var(y) U (Var(r) \ Var(é)) and [yo] = 1. 

We summarize all the above ingredients in the following definition: 


Definition 1. A logically constrained STRS (LCSTRS) consists of S, So, F, 
Fy, V, (Xa), [-] and R where 


1. S is a set of sorts, 

2. Ss CS is a set of theory sorts which contains bool, 

3. F is a set of function symbols in which every function symbol is assigned a 
simple type over S, 

4. Fo CF is a set of theory symbols in which the type of every theory symbol is 
a theory type over Sy, with f : bool and t : bool elements of Fo, 

5. V is a set of variables disjoint from F in which every variable is assigned 
a simple type over S and there are infinitely many variables to which every 
type is assigned, 


2 We do not require f to be in F \ Fy (that is, f can be a theory symbol) because a 
theory symbol may occur at the head position of a rewrite rule’s left-hand side in 
rewriting induction, and this general definition is in line with first-order LCTRSs. 
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6. (X4) is an Sg-indexed family of sets such that Xpyoo = {0,1}, 
7. |] is an interpretation of theory symbols such that [f] = 0 and [t] = 1, and 
8& RCT(F,V) x T(F,V) x T(Fo, V) is a set of rewrite rules. 


We usually let R alone stand for the system. 
And the following definition concludes the elaboration of constrained rewriting: 


Definition 2. Given an LCSTRS R, the set of rewrite rules induces the rewrite 
relation >r C T(F,V) x T(F,V) such that t >r t if and only if one of the 
following conditions is true: 


1. There exist a rewrite rule L > r |y] E€ R, a substitution o which respects 
L> r |y] and a context C|] such that t = Clo] and t = C[ro]. 

2. There exist theory symbols vı : Aj,...,Un : An, v : B and f : Ay >- > 
A, > B for n > 0 and Aı,..., An, B E€ Sg such that |f vi- --vn] = Ww], 
and a context C|] such that t = C|f v1 +--+ un] and t = C[v']. 


Note that the above conditions are mutually exclusive for any given context C|]: 
f vitt- Un is a theory term, whereas L in any rewrite rule > r [y] is not. If 
t >r t due to the second condition, we also write t +, t and call it a step of 
calculation. When no ambiguity arises, we may simply write > for >r. 


Example 5. We can rework Example 1 into an LCSTRS: 


take n l > nil [2 <0] taken nil — nil 
take n (cons x l) + cons x (take (n—1) 1) [n>0] 


where S = Sy U { intlist }, Sy = { bool, int}, F = Fy U { nil, cons, take}, Fy = 
{<,>,-,f,t} UZ, VD {l,n,xv}, <: int > int > bool, > : int > int — bool, 
— : int > int — int, f : bool, t : bool, v : int for all v € Z, nil: intlist, cons : int > 
intlist — intlist, take : int — intlist — intlist, J: intlist, n : int and x: int. 

Here and henceforth we let integer literals and operators, e.g., 0, 1, <, > and 
—, denote both the corresponding theory symbols and their respective images 
under the interpretation—in contrast to Examples 3 and 4, where we pedantically 
make a distinction between, say, 1 and 1. We also use infix notation for some 
binary operators to improve readability, and omit the logical constraint of a 
rewrite rule when it is t. Below is a rewrite sequence: 


take 1 (cons x (cons y l)) + cons x (take (1 — 1) (cons y J)) 
+, cons x (take 0 (cons y l)) > cons «x nil 


Example 6. In Section 1, the rewrite rules implementing the factorial function 
by fold constitute an LCSTRS. Below is a rewrite sequence: 


fact 1 — fold (x) 1 (genlist 1) — fold (x) 1 (cons 1 (genlist (1 — 1))) 
—,, fold (x) 1 (cons 1 (genlist 0)) — fold («) 1 (cons 1 nil) 
— (x) 1 (fold (*) 1 nil) > (*) Ll>, 1 
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Example 7. Consider the rewrite rule readint —> n, in which the variable n : int 
occurs on the right-hand side of —> but not on the left. Unconstrained STRSs do 
not permit such a rewrite rule, but LCSTRSs do. It looks as if we might rewrite 
readint to a variable but it is not the case: all the substitutions which respect 
this rewrite rule must map n to a value. Indeed, readint is always rewritten to a 
value of type int. We may have, say, readint + 42. Such variables can be used to 
model user input. 


Example 8. Getting input by means of the rewrite rule from Example 7 has 
one flaw: in case of multiple integers to be read, the order of reading each is 
non-deterministic. Even in the presence of an evaluation strategy, the order may 
not be the desired one. We can use continuation-passing style to choose an order: 


readint k => kn comp g f x—g (f x) sub — readint (comp readint (—)) 


where comp : ((int — int) > int) — (int > int > int) > int — int. If the first and 
the second integers to be read were 1 and 2, respectively, the following rewrite 
sequence would be the only one starting from sub: 


sub — readint (comp readint (—)) > comp readint (—) 1 
— readint ((—) 1) > (-) 12>, -1 


Since there is no way to specify the actual input within an LCSTRS, rewrite 
sequences such as the one above cannot be derived deterministically. Nevertheless, 
this example demonstrates that the newly proposed formalism can represent 
relatively sophisticated control mechanisms utilized by functional programs. 


Remarks. We reflect on some of the concepts presented in this section: 


— We use the phrase “terms modulo theories” in line with “satisfiability modulo 
theories”: some function symbols are interpreted within a theory. While such 
an interpretation gives rise to a way of identifying certain terms, namely those 
that are convertible to each other with respect to >,, we do not consider 
them identified (in other words, modulo «) in this paper. 

— First-order LCTRSs can be seen as instances of the newly proposed formalism, 
i.e., ones in which the type order of each function symbol is no greater than 
one, variables with a non-zero type order (i.e., higher-order variables) are 
excluded, and the type of both sides of a rewrite rule is always a sort. 

— Logical constraints are essentially first-order: the type order of a theory 
symbol cannot be greater than one while higher-order variables are excluded. 
This restriction rules out, for example, the following implementation: 


filter f (cons æ 1) + cons x (filter fl) [f a] filter f nil — nil 
filter f (cons x l) — filter f l [> (f x) 
The filter function can actually be implemented in an LCSTRS as follows: 


filter f (cons æ L) > if (f x) (cons x (filter f 1)) (filter f D) 
filter f nil — nil iftl lol ffl al 
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In the former implementation, the problem is not the higher-order variable 
f itself but its occurrence in logical constraints. In this case, because the 
filter function is usually meant to be used in combination with “user-defined” 
predicates—which are function symbols defined by rewrite rules and therefore 
do not belong to the theories—it makes sense to disallow f from occurring in 
logical constraints. In general, we may encounter use cases for higher-order 
constraints; until then, we focus on first-order constraints, which are very 
common in functional programs. 


4 Constrained Higher-Order Recursive Path Ordering 


Recall that an important part of our goal is to allow the abundant term rewriting 
techniques to be applied toward program analysis. We have defined a formalism 
for constrained higher-order term rewriting; now it remains to be seen that—or 
how—existing techniques can be extended to it. 

In the rest of this paper, we consider termination, an important aspect 
of program analysis and a topic that has been studied by the term rewriting 
community for decades. Not only is termination itself critical to the correctness of 
certain programs, but it also facilitates other analyses by admitting well-founded 
induction on terms. 

In this section, we adapt HORPO [21] to our formalism. This is one of the 
oldest, yet still effective techniques for higher-order termination. HORPO can be 
used either as a stand-alone method or in a higher-order version of the dependency 
pair framework [1,39,11,25]. Hence, this adaptation offers a solid basis for use in 
an analysis tool’s termination module. We will discuss the use of HORPO within 
the dependency pair framework in Section 5, and its automation in Section 6. 

Constrained RPO for first-order LCTRSs was proposed in [27]. We take 
inspiration from it for its approach to theory terms, formalize the ideas, and add 
support for (higher) types as well as partial application. 


4.1 HORPO, Unconstrained and Uncurried 


We first recall HORPO in its original form. Note that the original definition is 
based on an unconstrained and uncurried format, and a thorough discussion on it 
is beyond the scope of this paper. The following presentation is mostly informal 
and only serves the purposes of comparison and inspiration. 

We begin with two standard definitions: 


Definition 3. Given relations = and > over X, the generalized lexicographic 
ordering >! C X* x X* is induced as follows: xı ...%m >~'y,-..Yn if and only 
if there exists k < min(m, n) such that x; Z yi for alli < k and £k > Yk. 


Definition 4. Given relations = and > over X, the generalized multiset ordering 
>™ C X* x X* is induced as follows: £1... &m =™ Y1... Yn Uf and only if there 
exist a non-empty subset I of {1,...,m} and a mapping x from {1,...,n } to 
{1,...,m} such that 
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1. Vie L.Vj € n7} (i). £x; > yj, 
2. Wie {1,...,m}\LVji € m~t (i). £i X yj, and 
3. Vie {1,. m} \ Z O= 1. 


Here the generalized multiset ordering is formulated in terms of lists because 
we will compare argument lists by this ordering and this formulation facilitates 
implementation. In the following definition of HORPO, when we refer to the 
above definitions, = is the equality over terms and > is HORPO itself. 

Roughly, HORPO extends a given ordering over function symbols, and when 
considering terms headed by the same function symbol, compares the arguments 
by either of the above orderings. Given a well-founded ordering > C F x F, 
called the precedence, and a mapping s : F — {I,m}, called the status, HORPO 
is a type-preserving relation > such that s > t if and only if one of the following 
conditions is true: 


1. s=f(s1,...,5m), f E€ F and Jk. sk > t. 

2. s = f(s1,...,5m), f € F, t = @Q(...Q(@Q(to, t1), t2)... tn) and Vi.s > 
ti V dk. sp, = ti 

3. s = f(s1,..-,5m), t = g(ti,...,tn), f E F, g € F, f > g and Vi.s > 
ti V Jk. sk © ti 


4. s = f(S1,..-, Sm), t = f (ti... tm), f E F, S(f) = L S1... Sm >! ti... tm 
y 


: f(ti,..-,tm), f E F, s(f) = m and s1...5, >™ 

byes tip: 
6. s= @(so, $1), C= @Q(to, t1) and sosı >™ tot1. 
T. s = AT. So, t = Ax. to and so > to. 


Here > denotes the reflexive closure of >. 
We call this format uncurried because every function symbol has an arity, i.e., 
the number of arguments guaranteed for each occurrence of the function symbol 


in a term. This is indicated by the functional notation f(s1,..., Sm) as opposed to 
f 51°°+Sm. If f has arity m, its occurrence in a term must take m arguments so 
f(S1,--+;8m-—1) is not a well-formed term, for example. A function symbol’s type 


(or more technically, its type declaration) can permit more arguments than its 
arity guarantees. Such an extra argument is supplied through the syntactic form 
@(-,-). For example, if the same function symbol f is given an extra argument 
Sm41; We write @(f(51,...,5m),5m41). This syntactic form is also used to pass 
arguments to variables and \-abstractions. 

The difference between an uncurried and a curried format is more than a 
notational issue, and poses technical challenges to our extension of HORPO. 
Another source of challenges is, as one would expect, constrained rewriting. 


4.2 Rule Removal 


HORPO is defined as a reduction ordering >, which is a type-preserving, stable 
(i.e., t > t implies to > t'o), monotonic (i.e., t > t implies C[t] > C|t']) and 
well-founded relation. Note that despite its name, HORPO is not necessarily 
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transitive. If such a relation orients all the rewrite rules in R (i.e., > r for all 
L—> r € R), we can conclude that the rewrite relation +R is well-founded. 

A similar strategy for LCSTRSs requires a few tweaks. First, stability should 
be tightly coupled with rule orientation because every rewrite rule now is equipped 
with a logical constraint, which decides what substitutions are expected when the 
rewrite rule is applied. Second, the monotonicity requirement can be weakened 
because £ is never a theory term in a rewrite rule £ — r [p]. We define as follows: 


Definition 5. A type-preserving relation > C T(F,V) x T(F,V) is said 


1. to stably orient a rewrite rule L —> r |p] if lo = ro for each substitution o 
which respects the rewrite rule, and 
2. to be rule-monotonic if t > t’ implies C[t] > C|t'] when t € T(F9,V). 


Besides having rewrite rules stably oriented, we need to deal with calculation. 
It turns out to be unnecessary to search for a well-founded relation which includes 
>k, given the following observation: 


Lemma 1. —>,, is well-founded. 


Proof. The term size strictly decreases through every step of calculation. 


We rather look for a type-preserving and well-founded relation > which stably 
orients every rewrite rule, is rule-monotonic, and is compatible with >,, i.e., 
=>; > O >t or > ;—, C >t. This strategy is an instance of rule removal: 


Theorem 1. Given an LCSTRS R, the rewrite relation >r is well-founded 
if and only if there exist sets Rı and Ra such that >r, is well-founded and 
Rı UR2 = R, and type-preserving, rule-monotonic relations = and > such that 


1. = includes >, and stably orients every rewrite rule in Ry, 
2. > is well-founded and stably orients every rewrite rule in Rə, and 
3. >;>O>tor>;> C>". 


Here +R, assumes the same term formation and interpretation as +R does. 


Proof. If +p is well-founded, take R, = 0, Ro = R, > =, and > = >r. 
Note that +g = —, by definition. 

Now assume given R1, R2, > and >. Since > is rule-monotonic, includes 
—,, and stably orients every rewrite rule in Ri, >r, C =>. So the compatibility 
of > with = implies its compatibility with —,,, which in turn implies the 
well-foundedness of >r, U >, given that both +R, and > are well-founded. 
Since Rı U Rə = R and > is a rule-monotonic relation which stably orients every 
rewrite rule in R2, >r C >r, U >. Hence, >r is well-founded. 


In a termination proof of R, Theorem 1 allows us to remove rewrite rules that 
are in Ro from R. If none of the rewrite rules are left after iterations of rule 
removal, the termination of the original system can be concluded with Lemma 1. 
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4.3 Constrained HORPO for LCSTRSs 


Before adapting HORPO for LCSTRSs, we discuss how the theories may be 
handled. Let us consider the following system: 


rcna fox [n<)0 rec nx f — f (n—1) (rec (n—1) af) [n>0] 


where rec : int — int > (int + int — int) — int. In the second rewrite rule, 
the left-hand side of > is rec n {x f while the right-hand side has a subterm 
rec (n — 1) x f. It is natural to expect n > n — 1 in the construction of HORPO. 
Note that this is impossible with respect to any recursive path ordering for 
unconstrained rewriting because n is a variable occurring in n — 1; in an uncon- 
strained setting, we actually have n — 1 > n. Hence, we must somehow take the 
logical constraint n > 0 into account. To this end, we largely follow the ideas of 
constrained RPO for first-order LCTRSs [27]. 

The occurrence of n in the logical constraint ensures that n is instantiated 
to a value, say 42, when the rewrite rule is applied, and it is sensible to have 
42 > 42 — 1. Also, n > 0 guarantees that all the sequences of such descents are 
finite, i.e., the ordering Am. An.m > 0A m > n, denoted by 3, is well-founded. 
Let y H y’ denote, on the assumption that y and y’ are logical constraints such 
that Var(y) 2 Var(y’), that [yo] = 1 implies [p'o] = 1 for each substitution o 
which maps variables in Var(y) to values. Then we have n > 0 En In- 1. We 
thus would like to have s > tif y= s Ut. 

However, with the same ordering 4, we have both m>OAm>nEmin 
and n > 0An >m |=n Im, whereas we cannot have both m > n and n >m 
without breaking the well-foundedness of >. To resolve this issue, we split > into 
a family of relations (>) indexed by logical constraints, and let s >, t be true if 
p H s I t. We also introduce a separate family of relations (Z) such that s =, t 
if y = s I t where J is the reflexive closure of 3. Hence, Z is not necessarily 
the reflexive closure of >,; if it was, even n [n>1 1 would not be obtainable. 

Now we have a family of pairs ([,,>,), which does not seem to suit rule 
removal; after all, the essential requirement is a fixed relation which is type- 
preserving, rule-monotonic, well-founded and at least compatible with >,. When 
the definition of constrained HORPO is fully presented, we will show that >,—the 
irreflexive relation indexed by the boolean t—is such a relation and stably orients 
a rewrite rule l > r [y] if l>r. 

The annotation y of HORPO does not capture variables in Var(r) \ Var(£), 
which also have a part to play in the decision of what substitutions are expected 
when £ — r [y] is applied. We may use a new annotation to accommodate 
these variables but there is a hack (also present in [38]): given a variable in 
Var(r) \ Var(£), it can be harmlessly appended to y, syntactically and without 
tampering with any interpretation. We henceforth assume that Var(r) \ Var(¢) C 
Var(y) for each rewrite rule £ > r [p]. We also say that a substitution ø respects 
a logical constraint if o(x) is a value for all x € Var(y) and [yo] = 1. 

Before presenting constrained HORPO, we recall that in [21] all sorts collapse 
into one, and for example, int — int — int and int — intlist — intlist are 
considered equal. The idea is that the original rewrite relation can be embedded 
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in the single-sorted one, and if the latter is well-founded, so is the former. We 
follow this convention and henceforth compare types by their —-structure only. 
Below ae and >? are induced by X% and >y: 


Definition 6. Constrained HORPO depends on the following parameters: 


1. The interpretation of theory symbols I4: A— A —> bool for all A € Sy such 
that [D4] is a well-founded ordering over X4. The interpretation [34] is 
assumed to be the reflexive closure of [Da]. We usually write just 3 and 3 
because sorts collapse. Consider [I] the union U jes, [Gal], and [2] likewise. 

2. The precedence Pm, a well-founded ordering over F such that f > g for all 
f €F\ Fo and gE Fy. 

3. The status s, a mapping from F to {l,mo,mz3,...}. 


The higher-order recursive path ordering (HORPO) is a family of pairs of type- 
preserving relations (Zp, >=) indexed by logical constraints and defined by the 
following conditions: 


1. s Sy t if and only if one of the following conditions is true: 
(a) s and t are theory terms whose type is a sort, Var(s) U Var(t) C Var(y) 
andy Fs Jt. 
(b) s >ọt. 
(c) Slt. 


(d) s is not a theory term, s = so 81, t = to ti, So My to and sı Zo th. 


2. s >y t if and only if one of the following conditions is true: 


(a) s and t are theory terms whose type is a sort, Var(s) U Var(t) C Var(y) 
andy H= sIt. 


(b) s and t have equal types and s Dy t. 

(c) s is not a theory term, s = f s1- Sn for some f E€ F, t= f tic tn, 
Vi. si Dy ti and Ak. Sk =o tk. 

(d) s is not a theory term, s = £ S1 :::Sn for some xz E€ V, t= z tic- tn, 
Vi. si Dy ti and Fk. Sk =o tk. 


3. 8 Dy t if and only if s is not a theory term, s = f 51---8m for some f € F 
and one of the following conditions is true: 
(a) Ak. Sk Zo t. 
(b) t= to ti- tn, Vis Dy ti. 
(c) t=g ti tn, f > g, Vi. s Dy ti. 
(d) t= f tiota, S(f) =L Bi. vem >65 ti... tn; Vi. S Do ti. 
(e) t= f t1---th, S(f) = mk, k < n, 81... Smin(m,k) =p ti... tp, Vi. S Py ti. 
(f) t is a value or a variable in Var(y). 


Here s |, t if and only if there exists a term r such that s >* r and t >* r. 


Higher-Order LCTRSs and Their Termination 345 


Comparison to the Original HORPO. Conditions 1d, 2c and 2d are included 
in the definition so that [, and >, are rule-monotonic. We stress that it 
is mandatory to use the weakened, rule-monotonicity requirement instead of 
the traditional monotonicity requirement: if >, is monotonic, 1 >, 0 implies 
1—1>,1-—0, but t} (1—0) 3 (1-1), i.e., >, cannot possibly be well-founded. 

From curried notation, another issue related to rule-monotonicity arises, 
which leads to the above definition of >,. If we had the original HORPO naively 
mirrored, the definition of +, would include a condition which corresponds to 
condition 3b and reads: “s >, t if s is not a theory term, s = f 51--+ 8m for 
some f € F, t = to t1-+-tn and Vi. s >, ti V dk. Sk Sy ti”. Assume given such 
terms s and t, and that, say, s >, tı. Now if there is a term r to which s can 
be applied, we have a problem with proving s r >, t r = to tı ---t, r because 
s r >y tı is not obtainable due to the type restriction. Note that 7, and >y are 
by definition type-preserving, whereas >, is not. 

This limitation is overcome by means of >y, which actually makes the overall 
definition more powerful, and is reminiscent of the distinction between > and 
>, in later versions of HORPO (e.g., [5]). Other extensions from these works, 
however, are not yet included in the above definition, and except for the type 
restriction and uncurried notation, the conditions of >, largely match those of 
the original HORPO. 

Another subtle difference is the use of generalized lexicographic and multiset 
orderings: in the original HORPO, X is the reflexive closure of >, and therefore 
it suffices to use the more traditional definitions of lexicographic and multiset 
orderings. Here, as observed above, this would be unnecessarily restrictive. 

The split of a single multiset status label in m2,m3,... is due to curried 
notation—in particular, the possibility of partial application. If we had only a 
single multiset status label, which would admit, for example, both f 22>, f 1 
and f 13>, f 2 2, it would be possible that >+ is not well-founded: note that 
g (f 1) >: f 1 3 due to, among others, conditions 2b and 3b, and if f > g, we 
would then have f 22 >; g (f 1) due to, among others, conditions 2b and 3c. This 
change adds some power to constrained HORPO: we can prove, for example, the 
termination of the single-rule system f x a y > f b x (g y) by choosing s(f) = m2, 
which is not possible if all arguments must be considered, as the original HORPO 
requires. We do not need m; because this case is already covered by choosing l. 


Given an LCSTRS R, if we can divide the set of rules into two subsets Ry 
and Re, and find a combination of [5], > and s that guarantees £ =, r for all 
L> r [y] E€ Ri and £ >, r for all £ > r [yp] € R2, the termination of R is reduced 
to that of Rı. Before proving the soundness, we check out some examples: 


Example 9. We continue the analysis of the motivating example rec. Let [Tint] 
be Am. An.m > 0Am > nas above. There is only one function symbol in F \ Fø, 
and it turns out that » can be any precedence. Let s be a mapping such that 
s(rec) = l. The first rewrite rule can be removed due to conditions 2b and 3a. 
The second rewrite rule can be removed as follows: 


1. rec n x f >nso f (n— 1) (rec (n— 1) x f) by 2b, 2. 
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rec n £ f nso f (n—1) (rec (n — 1) x f) by 3b, 3, 4, 5. 
rec n £ f Pnso f by 3a, 6. 

rec n £ f >n>o n— 1 by 3a, 7. 

rec n x f Pnso rec (n — 1) x f by 3d, 8, 4, 9, 3. 

f Xn>o f by 1e. 

n XZn>0 n— 1 by la. 

n >n>0 n— 1 by 2a. 

rec n £ f >n>0o & by 3a, 10. 

£ nso x by le. 


DOTE AI ROU ee BS 


= 


Example 10. Consider Example 5. Let [Din] be Am.An.m > 0A m > n. Let 
> be a precedence such that take » nil and take » cons. Let s be a mapping 
such that s(take) = I. Then we can remove all of the rewrite rules. Note that to 
establish take n (cons z l) >nso cons x (take (n — 1) l), we need cons z l = nso £, 
which is obtainable because intlist is not distinguished from int. 


4.4 Properties of Constrained HORPO 


The soundness of constrained HORPO as a technique for rule removal relies on 
the following properties, which we now prove. 


Rule Orientation. The goal consists of two parts: =, stably orients a rewrite 
rule > r |y] if 2, r, and > stably orients a rewrite rule l > r [y] if >, r. 
The core of the argument is to prove the following lemma: 


Lemma 2. Given logical constraints p and y’ such that Var(y) 2 Var(y’) and 
pkey’, t= g'o holds for each substitution o which respects vp. 


Proof. It follows from y = gy’ that [y’o] = 1. Note that Var(y’c) = 0, and 
therefore y’ao’ = g'o for all o’. Hence, t = g'o. 


And the rest is routine: 


Theorem 2. Given a logical constraint p, terms s and t, the following statements 
are true for each substitution o which respects yp: 


1. s Z t implies so =, to. 
2. s >y t implies so > to. 
3. 8 Dy t implies so > to. 


Proof. By mutual induction on the derivation. Note that —,, is stable. 


Rule-Monotonicity. Both =, and >, are rule-monotonic for all y. The former 
is trivial to prove, and the key to proving the latter is the following lemma: 


Lemma 3. f 81-°-5m T Dot if f 51°: Sm Dyt. 


Proof. By induction on the derivation. 
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Now we can prove the rule-monotonicity: 
Theorem 3. >, is rule-monotonic. 


Proof. By induction on the context C|]. Essentially, we ought to prove that given 
terms s and t which have equal types, if s is not a theory term and s >, t, 
sr >ọ tr for allr, and r s >, r t for all r. We prove the former by case 
analysis on the derivation of s >, t, and prove the latter by case analysis on r: 
r = f 1r,---T, for some f € F orr=a271-::T, for some x € V. 


Compatibility. The strict relation >, is compatible with its non-strict counter- 
part =. we prove that %4; >_ C >U (>+; ++), given the following observation: 


Theorem 4. =, = >U Ix. 


~ 


Proof. By definition, =, D >U Jẹ. We prove =. C >U |, by induction on 
the derivation of s =, t. Only two cases are non-trivial. If s and t are ground 
theory terms whose type is a sort and [s J t] = 1, we have either [s 3 t] = 1 or 
[s] = [t], and the former implies s >, t while the latter implies s |, t. On the 
other hand, if s is not a theory term, s = so s1, t = to ti, So Zt to and s1 Xa ti, 
by induction, if so >+ to or sı >+ t1, we can prove s >; tin the same manner as 
we prove the rule-monotonicity of =+, or So |, to and s1 |, t1, then s |, t. 


Theorem 4 plays an important role in the well-foundedness proof of >, as well. 
For the compatibility of >, with Z+, it remains to prove that |, ; > C >t, 
which is implied by the following lemma: 


Lemma 4. Given terms s and s' such that s >, 8’, the following statements are 
true for all t: 


1. s Xat if and only if S Set. 
2. s > t if and only if 3 >, t. 
3. sòt if and only if s! ù t. 


Proof. By mutual induction on the derivation for “if” and “only if” separately. 
Note that —,, is confluent. 


The compatibility follows as a corollary: 


Corollary 1. X4; > C >U (>t 3 >t). 


~ 


Well-Foundedness. Following [21], we base the well-foundedness proof of >, on 
the predicate of computability [40,17]. There are, however, two major differences, 
which pose new technical challenges: =; is no more the reflexive closure of > 
and curried notation instead of uncurried notation is in use. 

In Definition 6, a and >? are induced by X and >y. We need certain 
properties of +! and +™ to prove that >, is well-founded. Because X+ is neither 
the equality over terms nor the reflexive closure of >+, those properties are less 
standard and deserve inspection. The property of >! is relatively easy to prove: 
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Theorem 5. Given relations = and > over X such that > is well-founded and 
>;= C>t, >l is well-founded over X” for alln. 


Proof. The standard method used when X is the equality still applies. 


We refer to [41] for the proof of the following property of =}: 


Theorem 6. Given relations = and > over X such that = is a quasi-ordering, 
> is well-founded and X ; > C >, >™ is well-founded over X*. 


Proof. See Theorem 3.7 in [41]. 


In comparison to [41], we waive the transitivity requirement for > above, but we 
cannot get around the requirement that = is a quasi-ordering without significantly 
changing the proof. This seems problematic because =, is not necessarily transitive 
due to its inclusion of >,. Fortunately, one observation resolves this issue: >? 
can equivalently be seen as induced by |,, and >, due to Theorem 4. In the same 
spirit, we can prove the following property: 


Theorem 7. J"; > C >? where sı...Sn JÙ t1...tn if and only if there exists 
a permutation T over {1,...,n} such that Sr(i) Le ti for all i. 


Proof. See Lemma 3.2 in [41]. 


Our definition of computability (or reducibility [17]) is standard: 
Definition 7. A term to is called computable if either 


1. the type of to is a sort and to is terminating with respect to >+, or 
2. the type of to is A — B and to tı is computable for all computable tı : A. 


In [21], a term is called neutral if it is not a \-abstraction. Due to the exclusion 
of A-abstractions, one might consider all LCSTRS terms neutral. This naive 
definition, however, does not capture the essence of neutrality: if a term to is 
neutral, a one-step reduct (with respect to >4) of to tı can only be tọ t4 where to 
and tį are reducts of to and tı, respectively. Because of curried notation, neutral 
LCSTRS terms should be defined as follows: 


Definition 8. A term is called neutral if it assumes the form x tı -- -tn for some 
variable x. 


And we recall the following results: 
Theorem 8. Computable terms have the following properties: 


1. Given terms s and t such that s >, t, if s is computable, so is t. 
2. All computable terms are terminating with respect to >+. 
3. Given a neutral term s, if t is computable for all t such that s >, t, so is s. 


Proof. The standard proof still works despite the seemingly different definition 
of neutrality. 
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In addition, we prove the following lemma: 


Lemma 5. Given terms s and t such that s |, t, if s is computable, so is t. 


Proof. By induction on the type of s and t. 


And we have the following corollary due to Theorem 4: 
Corollary 2. Given terms s and t such that s =, t, if s is computable, so is t. 


The goal is to prove that all terms are computable. To do so, the key is 
to prove that f s,---5m is computable where f is a function symbol if s; is 
computable for all i. In [21], this is done on the basis that f s1--- 8 is neutral, 
which is not true in our case. We do it differently and start with a definition: 


Definition 9. Given f : Ay > ---— An > B where f € F and B € S, let ar(f) 
be n. We introduce a special symbol T and extend our previous definitions so 
that T >t for allt E T(F,V) and T |, T. This way T = t ift © T(F,V) or 
t= T. Given terms t = t,...tn, let (t), be tk ifk <n, and T if k >n. Given 
terms s = f s1-:--Sm and t =g t,:--tn where f € F, g E€ F, all sı and ti are 
computable, we define >, such that s >c t if and only if f > g, or f = g and 


ST a I O ert 
— s(f) =m, and 
O tD Op or 
e (5), Kon (8), JE (t), Wiis (Oz; Vi > k. (3); Zt (t); and Ji > k. (3); mt Ei- 


This gives us a well-founded relation: 


Lemma 6. >, is well-founded. 


Proof. Since all computable terms are terminating with respect to >,, >¢ is 
well-founded over computable terms. The introduction of T clearly does not 
break this well-foundedness. The outermost layer of >. regards », which is 
well-founded by definition. We need only to fix the function symbol f and to go 
deeper. If s(f) = [, we know that >! is well-founded over lists of length ar(f) 
because of Theorem 5. If s( f) = mx, >e splits each list of arguments in two and 
performs a lexicographic comparison. We can go past the first component because 
of Theorems 6 and 7. And the rest, a pointwise comparison, is also well-founded. 
So we can conclude that >, is well-founded. 


Now we prove the aforementioned statement: 


Lemma 7. Given a term s = f 81-++Sm where f is a function symbol, if si is 
computable for alli, so is s. 


Proof. By well-founded induction on >e. We consider the type of s: 
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— If the type is a sort, we ought to prove that s is terminating with respect 
to >+. We need only to consider the cases in which s is not a theory term 
because all theory terms are terminating with respect to >, due to the 
well-foundedness of [5]. Take an arbitrary term t such that s >, t. We prove 
that t is terminating with respect to >, by case analysis on the derivation 
of s > t. If t = f ty---tm, Vi. Si Ze ti and Jk. sk >+ tk, we can prove that 
s >, t. By induction, t is computable and therefore terminating with respect 
to >+. If s ò; t, we prove that t is computable for all t such that s >; t (t is 
generalized) by inner induction on the derivation of s > t: 

1. If Jk. sk Z: t, t is computable due to Corollary 2. 

2. If t = to tı- tn and Vi.s py ti, ti is computable for all 7 by inner 
induction. By definition, t is computable. 

3. Ift=g t---tn, f > g and Vi. s >ò ti, ti is computable for all i by inner 
induction. It follows from f > g that s >, t, and t is computable by outer 
induction. 

4. Et = f tictn, S(f) = L S1...Sm >! t1...tn and Vi.s De ti, ti is 
computable for all i by inner induction. Likewise, s >, t. 

5. If t = f ti etis s(f) = Mk, k < N, S1... Smin(m,k) A t,...t, and 
Vi. s by ti, ti is computable for all i by inner induction. Likewise, s >, t. 

6. If t is a value, t is terminating with respect to >, and its type is a sort. 

— If the type is A — B, take an arbitrary computable Sm+1 : A. We prove 
that s =c S 8m41 = f $1°+:Sm41. Note that (s1... Sm)i = (81... Sm+1)i for 
alli < mand (81...5m)m41 = T >t (S1---Sm+1)m+41- Consider s(f) = I, 
s(f) =m, while k > m, and s(f) = mp while k < m. We have s >< S 8m41 
in each case. By induction, s Sm+1 is computable. Hence, s is computable. 


We conclude that s is computable. 
Now the well-foundedness of >; follows immediately: 
Theorem 9. >, is well-founded. 


Proof. We prove that every term t is computable by induction on t. Given 
Lemma 7, we need only to prove that variables are computable, which is the case 
because variables are neutral and in normal form with respect to =+. 


5 Discussion: HORPO and Dependency Pairs 


In Section 4, we discussed rule removal, and presented a reduction ordering to 
prove termination. However, in practice it is not so common to directly use 
reduction orderings as a termination method. Rather, the norm in the literature 
nowadays is to use dependency pairs. 

The dependency pair framework [1,16] allows a single term rewriting system 
to be split into multiple “DP problems”, each of which can then be analyzed 
independently. The framework operates by iteratively simplifying DP problems 
until none remain, in which case the original system is proved terminating. There 
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are variants for many styles of term rewriting, including first-order LCTRSs [25] 
and unconstrained higher-order TRSs [39,25,11]. 

Importantly, many existing techniques can be reformulated as “processors” 
(DP problem simplifiers) in the dependency pair framework. Such techniques 
include reduction orderings, which are at the heart of the dependency pair 
framework. This combination is far more powerful than using reduction orderings 
directly because the monotonicity requirement is replaced by weak monotonicity, 
and we do not have to orient the entire system in one go. 

Consider the following first-order LCTRS: 


u z y—>u(x+1) (y*2) [æ< 100] vy>v(y—1) [y>0 
ul00y¥> vy 


This system cannot be handled by HORPO directly: the ordering [Tint] needs to 
be fixed globally, so we can either orient the rewrite rule at the top-left corner or 
the one at the top-right corner, but not both at the same time. We could address 
this dilemma by using a more elaborate definition of HORPO (for example, 
by giving every function symbol an additional status that indicates the theory 
ordering to be used for each of its arguments), but this seems redundant: in 
practice, such a system would be handled by the dependency pair framework. 
Following the definition in [25], the above system would be split in two separate 
DP problems corresponding to the two loops: 


{uf £ y> u (x +1) (y * 2) [x < 100] } {vi y => vi (y — 1) [y > 0] } 


which could then be handled independently. 

While dependency pairs for LCSTRSs are not yet defined (and beyond the 
scope of this paper), we postulate that the definitions for curried higher-order 
rewriting in [11] and first-order constrained rewriting in [25] can be combined 
in a natural way. In this setting, HORPO would naturally be combined with 
argument filterings [1,11]. That is, since we only require weak monotonicity, some 
arguments can be removed. For example, the first DP problem above can be 
handled by showing the following inequalities: 


ul £ >p<in9 ul (£ + 1) U Ze<100 U V Zyso0 V u Zev 


This is the case with u Pm v. 


6 Implementation 


A preliminary implementation of LCSTRSs is available in Cora through the link: 
https://github.com/hezzel/cora 


Cora is an open-source analyzer for constrained rewriting, which can be used 
both as a stand-alone tool and as a library. Note that Cora is still in active 
development, and its functionalities, as well as its interface, are subject to change. 
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Nevertheless, Cora is already used in several student projects. Cora supports only 
the theories of integers and booleans so far, but is intended to eventually support 
any theory, provided that an SMT solver is able to handle it. Example input files 
are supplied in the above repository. The version of this paper is available in [28]. 


Automating Constrained HORPO. Cora includes an implementation of 
constrained HORPO. Following existing termination tools such as AProVE [14], 
NaTT [42] and Wanda [26], we use an SMT encoding such that a satisfying 
assignment to variables in the SMT problem corresponds to a combination of the 
precedence >, the status s and the ordering [int] that proves the termination of 
the encoded system by constrained HORPO. As for booleans, we simply choose 
the ordering [Obooi] such that [t Soo f] = 1. 

To encode the precedence and the status, we introduce integer variables 
prec, and stat, for each function symbol f that is not a value. We require 
that prec, < 0 if f is a theory symbol, and that prec, > 0 otherwise—so that 
prec, > prec, corresponds to f » g. The value k of stat, indicates s(f) = (if 
k = 1, and s(f) = mẹ if k > 1. We let down be a boolean variable which indicates 
the choice between two possibilities for [Tint]: Am.An.m > —M Am >n and 
Am.An.m < M Am <n (the choice of M is discussed below). 

In the derivation of s >, t, all assertions assume the form s’ Rọ t’ where 
s’ and t’ are subterms of s and t, respectively (see Example 9). Hence, given 
a finite set of rewrite rules, there are only finitely many possible assertions to 
be analyzed. By inspecting the definition of constrained HORPO, we also note 
that there are no cyclic dependencies. For all ¢ > r [yp], respective subterms s 
and t of £ and r, and RE {%,>,p>,1la,1b,...,3f£}, we thus introduce a variable 
(s Rọ t) with its defining constraint. Without going into detail for all the cases, 
we provide a few key examples: 


— If s and t do not have equal types, we add —=(s ~, t); otherwise, we add 
(s Sp t) = (s lay t) V (s lby t) V (s Icy t) V (s 1dy t), which states that 
if s =, t holds, it must hold in one of the defining cases la, 1b, 1c and 1d. 
Each of these cases in turn has its defining constraint. 

— (f S1: Sm 3Co g t-++tn) => prec, > prec, AA, (f 51t Sm Do ty). 

— We come up with the defining constraint of (s 2a, t) by case analysis: 

e If either of s and t is not a theory term, or their respective types are not 
the same theory sort, or Var(s) U Var(t) É Var(y), we add 7(s 2a, t). 
e Otherwise, we consider the type of s and t: 

x The type is int. We respectively check ify => s>—-MaAs>t 
andy = s< M ^s < tare valid. If the former is not valid, 
we add (s 2a, t) => -down; if the latter is not valid, we add 
(s 2ay t) => down. That is, if both of the validity checks fail, both 
of the constraints are added, which is equivalent to adding 7(s 2a, t). 

* The type is bool. We add —(s 2a, t) if p =— s A-t is not valid; if 
it is valid, nothing is added and the SMT solver is free to set true for 
the variable (s 2a, t). 
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Here M is twice the largest absolute value of integers occurring in the 
rewrite rules, or just 1000 if that is too large—this value is chosen arbitrarily. 
Note that the validity checks are not included as part of the SMT problem: 
if they were included, the satisfiability problem would contain universal 
quantification, which is typically hard to solve. We rather pose a separate 
question to the SMT solver every time we encounter theory comparison, 
and for integers, consider whether the pair can be oriented downward with 
Am. àn.m > —M Am > n, upward with Am.An.m < M Am < n, or not at 
all. Hence, we must fix the bound M beforehand. 

— The hardest is 3e: we need not only to encode the multiset comparison, but 
also to make sure that only k arguments are to be considered on both sides 
(should there be more). Following Definition 4, we introduce boolean variables 
strict,,...,strict,, where strict; indicates i € J, and integer variables 
m(1),..., a(n). The defining constraint of (f 51---Sm 3e, f tı- tn) is the 
conjunction of the following components: 

(f 81°++Sm 3ey f tr-++tn) => 2< stats <n. 

(f S1 Sm 3e% f ti-++tn) = Aj\f S1°°'Sm Dy tj). 

(f S1 Sry 3e% f titt tn) => VV, strict; 

For all i € {1,...,m}, (f $1:-+5m 3e% f tr--+tn) A strict; => i< 

stats. That is, I C {1,...,k} if s(f) = mg. 

e For all j € {1,..., n}, (f $1--+ 5m 3ey f titt ttn) Aj < stat, =—> 1< 
Tg) An(j) < mA r(j) < stats. That is, 1 < t(j) < min(m, k) for all 
jE{1,...,k}ifs(f) = mk. 

e For alli € {1,...,m}, j E€ {1,...,.n—1} and j’ € {j4+1,...,n}, 
(f 81°++Sm 3€% f tr--+tn) => strict; Va(j) FiV 7(j’) Ai. That is, 
[r= (@)| <1 for alli € {1,...,m}\ which suffices because we can 
add to J alli € {1,..., min(m, k) } \ J such that |r~*()| = 0 without 
changing the generalized multiset ordering if s( f) = mg. 

e For allie {1,...,m} and j E€ {1,... n}, (f s1--+ Sm 36€% f ti ttn) A 
a(j) =2A strict; => (si +o tj). 

e For allie {1,...,m} and j € {1,... n}, (f s1--+ 5m 3€% f ti--+tn) A 
t(j) =tArnstrict; => (si Zo tj). 


Cora succeeds in proving that all the examples in this paper are terminating, 
except Example 2, which is non-terminating. 


7 Related Work 


In this section, we assess the newly proposed formalism and the prospects for its 
application by comparing and relating it to the literature. 


Term Rewriting. The closest related work is LCTRSs [27,12], the first-order 
formalism for constrained rewriting upon which the present work is built. Similarly, 
there are numerous formalisms for higher-order term rewriting, but without 
built-in logical constraints, e.g., [21,22,31]. It seems likely that the methods for 
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analyzing those can be extended with support for SMT, as what is done for 
HORPO in this paper. 

Also worth mentioning is the K Framework [35], which, like our formalism, 
can be used as an intermediate language for program analysis and is based on a 
form of first-order rewriting. The K tool includes techniques through reachability 
logic, rather than methods like HORPO. 

There are several works that analyze functional programs using term rewriting, 
e.g., [2,15]. However, they typically use translations to first-order systems. Hence, 
some of the structure of the initial problem is lost, and their power is weakened. 


HORPO. Our definition of constrained HORPO is based on the first-order 
constrained RPO for LCTRSs [27] and the first definition of higher-order RPO 
[21]. There have been other HORPO extensions since, e.g., [5,6], and we believe 
that the ideas for these extensions can also be applied to constrained HORPO. 
We have not done so because the purpose of this paper is to show that and how 
techniques for analyzing higher-order systems extend, not to introduce the most 
powerful (and consequently more elaborate) ones. 

Also worth mentioning is [4], a higher-order RPO for A-free systems. This 
variant is defined for the purpose of superposition rather than termination analysis, 
and is ground-total but generally not monotonic. 


Functional Programming. There are many works performing direct analyses 
of functional programs, including termination analysis, although they typically 
concern specific programming languages such as Haskel (e.g., [19]) and OCaml 
(e.g., [20]). A variety of techniques have been proposed, such as sized types [33] 
and decreasing measures on data [18], but as far as we can find, there is no 
real parallel of many rewriting techniques such as RPO. We hope that, through 
LCSTRSs, we can help make the techniques of term rewriting available to the 
functional programming community. 


8 Conclusion and Future Work 


In summary, we have defined a higher-order extension of logically constrained 
term rewriting systems, which can represent realistic higher-order programs in a 
natural way. To illustrate how such systems may be analyzed, we have adapted 
HORPO, one of the oldest higher-order termination techniques, to handle logical 
constraints. Despite being a very basic method, this is already powerful enough 
to handle examples in this paper. Both LCSTRSs and constrained HORPO are 
implemented in our new analysis tool Cora. 

In the future, we intend to extend more techniques, both first-order and 
higher-order, to this formalism, and to implement them in a fully automatic 
tool. We hope that this will make the methods of the term rewriting community 
available to other communities, both by providing a powerful backend tool, and 
by showing how existing techniques can be adapted—so they may also be natively 
adopted in program analysis. 
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A natural starting point is to increase our power in termination analysis by 
extending dependency pairs [1,39,11,25] and various supporting methods like the 
subterm criterion and usable rules. In addition, methods for analyzing complexity, 
reachability and equivalence (e.g., through rewriting induction [34,12]), which 
have been defined for first-order LCTRSs, are natural directions for higher-order 
extension as well. 
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Abstract. Sound static analyses are an important ingredient for com- 
piler optimizations and program verification tools. However, mathemat- 
ically proving that a static analysis is sound is a difficult task due to 
two problems. First, soundness proofs relate two complicated program 
semantics (the static and the dynamic semantics) which are hard to rea- 
son about. Second, the more the static and dynamic semantics differ, 
the more work a soundness proof needs to do to bridge the impedance 
mismatch. These problems increase the effort and complexity of sound- 
ness proofs. Existing soundness theories address these problems by de- 
riving both the dynamic and static semantics from the same artifact, 
often called generic interpreter. A generic interpreter provides a com- 
mon structure along which a soundness proof can be composed, which 
avoids having to reason about the analysis as a whole. However, a generic 
interpreter restricts which analyses can be derived, as all derived analyses 
must roughly follow the program execution order. 

To lift this restriction, we develop a soundness theory for the blackboard 
analysis architecture, which is capable of describing backward, demand- 
driven, and summary-based analyses. The architecture describes static 
analyses with small independent modules, which communicate via a cen- 
tral store. Soundness of a compound analysis follows from soundness of 
all of its modules. Furthermore, modules can be proven sound indepen- 
dently, even though modules depend on each other. We evaluate our 
theory by proving soundness of four analyses: a pointer and call-graph 
analysis, a reflection analysis, an immutability analysis, and a demand- 
driven reaching definitions analysis. 


Introduction 


3 Hessian Center for Artificial Intelligence (hessian. AI), Darmstadt, Germany 


Developing static analyses is a laborious and complicated task due to the com- 
plexity of modern programming languages. A significant part of the complica- 
tion pertains to ensuring that static analyses are sound, i.e., over-approximate 
the runtime behavior of analyzed programs. Unfortunately, even well-established 
static analyses are shown to be unsound, e.g., since 2010, more than 80 sound- 
ness bugs have been found in different analyses used in the LLVM compiler [46]. 
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Testing helps finding soundness bugs but cannot prove their absence, leaving the 
trustworthiness of these analyses in question. 


Mathematical soundness proofs ensure the absence of soundness bugs. How- 
ever, such proofs are difficult for two reasons: First, soundness proofs relate two 
program semantics: the static semantics and the dynamic semantics [12]|—each 
in its own can individually be complex. Especially modern programming lan- 
guage features such as reflection [30], concurrency [29], or native code [1] are 
notoriously difficult to analyze and hard to reason about. Second, the style of 
static and dynamic semantics can differ significantly, e.g., the static semantics 
of Doop [7], which is described in Datalog, differs significantly from dynamic 
semantics described with small-step rules [6]. This impedance mismatch makes 
soundness proofs monolithic, i.e., it is difficult to determine which parts of the 
static semantics relate to which parts of the dynamic semantics, requiring the 
soundness proofs to reason about both semantics as a whole. These problems 
complicate soundness proofs such that only leading experts with multiple years 
of experience can conduct them [13, 26]. 


To deal with the complexity of soundness proofs, existing works modularize 
static and dynamic semantics [5, 14,28]. This modularization allows to compose 
a soundness proof for the entire analysis from soundness lemmas of small parts 
of the analysis. This allows reasoning about small parts of the analysis one at a 
time. These existing works require that both the static and dynamic semantics 
are derived from the same artifact, often called a generic interpreter. A generic 
interpreter describes the operational semantics of a language, without referring 
to details of dynamic or static semantics, and provides a common structure along 
which a soundness proof can be composed. However, generic interpreters restrict 
what types of analyses can be derived. In particular, generic interpreters derive 
analyses that follow the program execution order, specifically, forward whole- 
program abstract interpreters. But it is unclear how other types of analyses can 
be derived that do not follow the program execution order, such as backward, 
demand-driven/lazy, or summary-based analyses. 


The work presented in this paper lifts this restriction by developing a sound- 
ness theory for the blackboard analysis architecture. The architecture is the foun- 
dation of the OPAL framework [21], which has been used to develop differ- 
ent kinds of analyses, including backward analyses [17], on-demand /lazy analy- 
ses [19,41], and summary-based analyses [21]. In the architecture, complex static 
analyses are modularly composed from smaller, simpler static modules that han- 
dle individual language features, e.g., reflection, or program properties, e.g., 
immutability. These modules are decoupled—they are not allowed to call each 
other directly; instead, they communicate with each other by exchanging infor- 
mation via a central data store called blackboard [39] orchestrated by a fixpoint 
solver. 


To develop a soundness theory for the blackboard analysis architecture, we 
define a dynamic semantics, which follows the same style as the static seman- 
tics and thus avoids the impedance mismatch problem. Specifically, the dynamic 
semantics is composed of dynamic modules that communicate with each other 
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via a store. Our soundness theory is compositional, which means that each static 
module can be proven sound individually and soundness for the compound anal- 
ysis follows from a meta theorem. Furthermore, we extend the theory to make 
soundness proofs of existing static modules reusable across different analyses. In 
particular, we prove that the soundness proof of an static module remains valid, 
even if (a) the compound analysis processes source code elements unknown to the 
module and (b) the store contains other types of analysis information unknown 
to the module. Furthermore, our proofs are polymorphic in the lattices on which 
static modules operate, i.e., the lattices can be changed without affecting sound- 
ness. For instance, we can reuse a pointer-static module, which typically depends 
on an allocation-site lattice, in a reflection analysis to propagate string informa- 
tion by extending this lattice without invalidating the pointer-static modules’ 
soundness proof. 

We demonstrate the applicability of our theory by implementing four different 
analyses and their dynamic semantics in the blackboard analysis architecture: (1) 
a pointer and call-graph analysis, (2) an analysis for reflection, (3) an immutabil- 
ity analysis, and (4) a demand-driven reaching-definitions analysis. Our choice 
of analyses is inspired by existing state-of-the-art analyses for Java implemented 
in the OPAL framework [21,41]. We implemented and tested each analysis and 
dynamic semantics in Scala to ensure they are executable. Furthermore, we used 
our theory to prove each analysis sound, where each analysis exercises a different 
aspect of our theory: (1) static modules can be proven sound independently de- 
spite mutually depending on each other, (2) soundness of modules remains valid 
even though the lattice changes, (3) soundness of a module remains valid even 
though different source code elements are analyzed, and (4) our theory applies 
to analyses which do not follow the program execution order. 


In summary, we make the following contributions: 


— We give the first formalization of the blackboard analysis architecture (Sec- 
tion 2). 

— We develop a theory of compositional soundness proofs for the formal model 
of the blackboard analysis architecture. We prove that soundness of an anal- 
ysis follows from independent soundness proofs for each of its modules (Sec- 
tion 3). 

— We show how to make soundness proofs reusable by extending our the- 
ory (Section 4). 

— We demonstrate the applicability of our theory on four different types of 
analyses (Section 5). 


All proofs of theorems, lemmas, and case studies are provided in the paper’s 


supplementary material. 


2 Blackboard Analysis Architecture 


In this section, we introduce and formalize the static and dynamic semantics of 
the blackboard analysis architecture used in the OPAL framework [21]. 
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2.1 Static Semantics 


Static analyses in the blackboard analysis architecture consist of multiple static 
modules exchanging information via a central data store called blackboard [39]. 
This avoids coupling between modules as they are not allowed to call each other 
directly: Modules store analysis results in the blackboard using keys. These keys 
allow other modules to retrieve results without needing to know their producer. 


Definition 1 (Static Semantics). We define basic notions and datatypes of 
the static semantics of the blackboard analysis architecture: 


1. Entities (€ € Entity )4 are parts of programs an analysis can compute in- 
formation for. For example, entities could be classes, methods, statements, 
fields, variables, or allocation sites of objects. Entities are ordered discretely: 
& Ce iff è = ea. 

2. Kinds (k € Kind) identify analysis information that can be computed for 
an entity. For example, a class entity could have kinds for its immutability 
or thread safety, a variable entity could have kinds for its definition site or 
approximations of its v its value. Kinds are al also ordered discretely. 

3. Properties (p € Property |i ] where Property : Kind > Lattice) denote analysis 
information which is identified by a kind k. For instance, a class entity could 
have an immutability property “mutable” or “immutable”. Properties of a kind 
are partially ordered and form a lattice. 

4. A central store (© € Store C Entity x (« : Kind) — Property|x |)? contains all 
properties for each entity and kind. We use the notation G(é,«) for a store 
lookup of an entity € and kind k, which results in the bottom element L in case 
the property is not present. Furthermore, we use the notation © U [E,K + P] 
for writing a new property p to the store. If a property for the entity € and 
k already exists in the store, then the old property is joined with the new 
property. Stores are ordered point-wise. 

5. Static modules (F € Module = Entity x Store + Store) are monotone functions 
that compute properties of a given entity. The store allows multiple static 


modules to communicate and exchange information without having to call 
each other directly. Each static module has access to the entire store and can 
contribute to one or more properties. 

6. The fixpoint algorithm (fix : P(Module) x Store > Store) computes a fixpoint 
of a compound analysis F € P (Module) for an initial store 61. More specifi- 
cally, the fixpoint fix(F, G1) isa store Gn I C such that static modules f eF 
do not add new information, i.e., fe, On) = Gn for all € € dom(G,,). The 
fixpoint is unique and Guanine. to exist when all properties are lattices of 


finite height [10]. 


4 We use a hat symbol~ to disambiguate static definitions from dynamic definitions 
with the same name but without hat. 

5 The syntax A — B denotes a partial function from A to B. Furthermore, dom(f) is 
the set of all inputs for which a partial function f is defined. 
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The types Entity, Kind, and Property are defined by analysis developers, whereas 
the other types and functions are fixed by this definition. 


We illustrate Definition 1 at the example of a text-book reaching-definitions 
analysis [38] for an imperative language with labeled assignments and expres- 
sions: 


Entity = Stmt 
Property [Kcontroiriowrea] = P(Stmt) 
Property [Kreachingpets| = Var => P (Assign) 
Store = [Stmt X KcontrolFlowPred — P (Stmt)| 
U [Stmt x KReachingDefs — (Var ro P (Assign) )] 


aaa ae Ta a Seki es eee 
reachingDefs(stmt: Entity, @: Store): Store = 
predecessors = o(stmt, KcontrolFlowPred ) 


mi Llpcpreaecessors © (P> KReachingDets ) 
ee 
out = stmt match 


case Assign(x,_,_) => in[x+> {stmt} 
case _ => in 


G U (stmt, Kreachingbefs +> out] 


The static module reachingDefs is implemented with Scala-like pseudo code. 
Module reachingDefs computes for every statement of the program which vari- 
able definitions reach it. Therefore, entities are statements and the module’s 
property is a mapping from variables to assignments that may have defined it. 
Module reachingDefs joins the reaching definitions of all control-flow predecessors 
and then updates them on variable assignments. Note that module reachingDefs 
neither computes the control-flow predecessors directly nor does it call another 
module which computes this information. Instead, it retrieves this information 
from the store G. This decoupling avoids dependencies between static modules 
and enables compositional soundness proofs. 


2.2 Dynamic Semantics 


Static analyses in the blackboard analysis architecture are proven sound with 
respect to a dynamic semantics in the same style, which we define formally in 
this subsection: 


Definition 2 (Dynamic Semantics). We define the dynamic semantics used 
to prove soundness of analyses in the blackboard analysis architecture: 


1. The dynamic semantics depends on concrete versions of entities (e € Entity), 
properties (p € Property[K] where Property : Kind — Set) and stores (o € 
Store C Entity x (< : Kind) — Property[«]). The kinds are the same as for 
static modules. 

2. Dynamic modules (f € Module = Entity x Store — Store) are partial functions 
which may only be defined for a subset of entities. Furthermore, the partial 
function is undefined in case it tries to lookup an element from the store 
which is not present. 
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3. Static analyses are proven sound with respect to a dynamic reachability se- 
mantics (reachable : P(Module) x Store + P(Store)). The reachability se- 
mantics returns the set of all reachable stores by iteratively applying a set 
of dynamic modules. More specifically, the set reachable(F, o1) contains store 
cı and for all f € F, reachable stores o, and for entities e E€ dom(a), the set 
contains f(e,a), if it is defined. 


We illustrate these definitions again at the example of the reaching-definitions 
analysis which we introduced in the previous subsection: 


Entity = Stmt | Unit 

Property |KcontroiFiowPrea| = Stmt 

Property|nreachingbers) = Var — Assign 

Property|Kstate] = ProgramState 

Store = [Stmt X KcontrolFlowPred — Stmt] U [Stmt x KReachingDefs a (Var a Assign)] 
U [Unit x Kstate — ProgramState] 


reachingDefs(stmt: Entity, ø: Store): Store = 
predecessor = o(stmt, KcontroiFlowPrea) 
in = o(predecessor, KreachingDets ) 
out = stmt match 
case Assign(x,_,_) => in[x > stmt] 
case _ => in 
o[stmt, KReachingDefs > out] 


controlFlow(stmt1: Entity, ao: Store): Store = 
state1 = o[Unit, Kstate| 
(stmt2, state2) = step(stmt1, statel) 
o[stmt2, KcontroiFlowPrea (> Stmt1][Unit, Kstate > state2] 


Dynamic module reachingDefs is analogous to its static counterpart reachingDefs, 
but computes the most recent definition of a variable instead of all possible def- 
initions. The dynamic module depends on the control-flow predecessor, which is 
the most recently executed statement. The control-flow predecessors are com- 
puted by module controlFlow, which is based on a small-step operational seman- 
tics step : Stmt x ProgramState — Stmt x ProgramState. Module controlFlow 
demonstrates that the blackboard architecture is capable to integrate existing 
dynamic operational semantics, such as those for Java [6] or WebAssembly [18]. 

The blackboard analysis architecture not only modularizes the static seman- 
tics but also the dynamic semantics, which is crucial for enabling compositional 
and reusable soundness proofs. In particular, each static module is proven sound 
with respect to exactly one dynamic module, which limits the proof scope and 
guarantees proof independence. Furthermore, for analyses that approximate non- 
standard dynamic semantics, the standard dynamic semantics can be modularly 
extended with further modules (e.g., Section 5.1). 

To summarize, in this section we formally defined the blackboard analysis 
architecture, which allows to implement static analyses modularly. Furthermore, 
we defined a dynamic semantics in the same style against which analyses are 
proven sound. 
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3 Compositional Soundness Proofs 


In this section, we develop a theory of compositional soundness proofs for anal- 
yses in the blackboard style: Soundness of a compound analysis follows directly 
from soundness of the individual static modules. This soundness theory simplifies 
the soundness proof, because it allows analysis developers to focus on soundness 
of individual static modules, instead of having to reason about soundness of 
the interaction of all static modules with each other. Furthermore, the sound- 
ness theory makes the proofs more maintainable, as a change to a module only 
affects the proof of that module and nothing else. 

We start the section by defining soundness of static modules and then work 
up to soundness of whole analyses. The definitions of soundness are standard 
and build upon the theory of abstract interpretation [12]: 


Definition 3 (Soundness of Static Modules). An static module f € Module 
is sound if it overapproximates its dynamic counterpart f € Module: 


sound(f, f) iff VE € Entity, © E€ Store, e € YEntity(€), 7 E Ystore(F). 


n~ 


fle, a) € Ystore(f (€, ©)) 


The expression x € 7(y) reads as “element ¥ soundly overapproximates the 
concrete element x.” Function y : L —> P(L) is a monotone function from an 
abstract domain L to a powerset of a concrete domain L and is called concretiza- 
tion function. We do not require that an abstraction function a: P(L) > Lin 
the opposite direction exists nor that y and a form a Galois connection, both of 
which are not necessary for soundness proofs. 

The soundness definition above requires that analysis developers define con- 
cretizations for entities (Yentity : Entity — P(Entity)) and properties (property : 
Property[«] => P(Property[x])). Often the abstract and concrete entities are of 
the same type (Entity = Entity). In this case, the concretization functions map 
to singleton sets (7entity(e) = {e}). Based on concretization functions for enti- 
ties, kinds, and properties, we define a point-wise concretization on stores. The 
definition can be found in the supplementary material. 

In the following, we define soundness of compound analyses. 


Definition 4 (Soundness of a Compound Analysis). Let C Module x 
Module be a set of static modules paired with corresponding dynamic modules. 
A compound analysis is sound if the fixpoint of all of its static modules overap- 
proximates the reachability semantics of the corresponding dynamic modules: 


sound(®) iff VS € Store. reachable(F, ystore(@)) C \store(fix(F’, @)) 
where F = {f | (f,_) € O} and F = {f | (_, f) € 8}. 


The compound analysis approximates the dynamic reachability semantics (Def- 
inition 2.3), which collects the set of all stores reachable by applying dynamic 
modules. The dynamic reachability semantics is a collecting semantics, com- 
monly used to prove soundness of abstract interpreters [12]. 

We are now ready to state the main theorem of this work: 


368 S. Keidel et al. 


Theorem 1 (Soundness Composition). Let ® C Module x Module be a set 
of static modules paired with corresponding dynamic modules. Soundness of a 
compound analysis follows from soundness of all of its static modules: 


A 


If sound(f, f) for all (f, f) € ® then sound(@). 


Proof. We show reachable(F, ystore(G1)) C store (fix(F’, @1)) by well-founded in- 
duction on X < reachable(F, X). 


— Base case: reachable( F, 2) = Ø C store (fix(F’, &1)) x 

— Inductive case: Suppose X C Ystore(fix(F’,O1)) and Fn = fix(F',a1). Then 
for allo € X C “store(fix(F’,G1)), we get dom(a) C a eee 
and o(e, k) © Yproperty(En(€, &)) for all V(e, K) € dom(Gn) and e € YEntity(€ 
Furthermore, since On is a fixpoint, it holds f(€,0n) E On for all fe 
and € € dom(G,). From sound(f, f) we oe fle,c) € “store ( f (é, On)) 


Ystore(Gn) = Ystorel(fix(F', 1) for all (f, f) € S, (e,_) € dom(a), @,_) 
dom(Gn) with e € Yentity(€). It follows reachable( F, X) C store (fix(F’, F1)). 


MIN Wo 


We illustrate this theorem by applying it to the reaching definitions analysis 

from Section 2.1. Specifically, soundness of the compound analysis follows from 

soundness of module reachingDefs module controlFlow by Theorem 1: 
sound(reachingDefs, reachingDefs) 


sound(controlFlow, controlFlow) 


sound({(reachingDefs, reachingDefs), (controlFlow, controlFlow) }) 


This means reachingDefs can be proven sound independently from controlFlow, 
even though the modules interact with each other in the compound analysis. 
The proof independence is possible because neither module reachingDefs nor 
reachingDefs call the control-flow modules directly. Instead, both the static and 
dynamic module read the control-flow information from the stores, which are 
guaranteed to be a sound overapproximation initially (assumption o € Ystore(Z)). 
Furthermore, only properties that the reaching-definitions modules themselves 
wrote to the store need to be sound overapproximations. Properties that other 
modules wrote to the store are not subject of the soundness proof of the reaching- 
definitions modules. The soundness proof of module reachingDefs is found in the 
supplementary material. 

To summarize, in this section we developed a theory of compositional sound- 
ness proofs for analyses described in the blackboard architectural style. Each 
static module can be proven sound independently from other modules. Further- 
more, soundness of a whole analysis follows directly from soundness of each 
module. In particular, no reasoning about the analysis as a whole is required. 


4 Reusable Soundness Proofs 


As of now, static modules refer to a specific type of entities, kinds, properties, 
and stores. However, adding new modules to an analysis may require extending 
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these types. This invalidates the soundness proofs of existing modules and they 
need to be re-established. In this section, we extend our theory to make static 
modules and their soundness proofs reusable. 


4.1 Extending the Type of Entities and Kinds 


We start by explaining how entities and kinds can be extended without invali- 
dating existing soundness proofs. 

For example, if we were to add a taint static module to an existing analysis 
over types Entity, Kind, and Store, we needed to extend these types to hold the 
new analysis information: 


Entity’ = Entity | Var Kind! = Kind | Krain 


But this invalidates the proofs of existing modules that depend on the subsets 
Entity and Kind. To solve this problem, we first parameterize the type of modules 
to make explicit what types of entities and kinds they depend on: 


Definition 5 (Parameterized Modules (Preliminary)). We define a type 
of module that is parameterized by the types of entities E, kinds K, and store S: 


f € Module[E, K] = VS : Store[E, K]. Ex SS 


Interface Store| Æ, K] defines read and write operations for an abstract store 
type S, that restricts access to entities of type E and kinds of type K. The 
store interface allows us to call parameterized modules with stores containing 
supersets of the type of entities and kinds. 

For these parameterized modules, we define a sound lifting to supersets of 
entities and kinds: 


lift: WE’, K',E C E', K C K',Moduleļ[E, K] —> Module[E’, K] 
lift(f) Ce’, o) = e' match 

case e: E => f(e, o) 

case _ => 0 


The lifting calls module f on all entities of type E on which f is defined and 
simply ignores all other entities, returning the store unchanged. For example, 
the lifted reaching-definitions module lift[Stmt | Var, KReachingpets | 
KcontrolFlowPred | KTaint|(reachingDefs) operates on the entities Stmt and the kinds 
KReachingDefs | KControlFlowPred; but ignores entities Var and kinds KTaint- 

The lifting preserves soundness of the lifted modules for disjoint extensions 
of entities. 


Definition 6 (Disjoint Extension). Entities P > Ê and E' D E are a dis- 
joint extension iff Yentiy(E) C E and yentity(E’ \ E) C E’ \ E. 


In other words, the concretization function Yentity does not mix up entities in E 
and F’ \ E. 
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Lemma 1 (Lifting preserves Soundness). Let fe Module[E, K] and f € 
Module[E, K] be a parameterized static module and dynamic module, E' 2 E 
and E' D E be a disjoint extension of entities, and K' D K a superset of kinds. 


If sound(f, f) then sound (liftE’, K’](f), liftLe”, K'A). 


Proof. Let f ; Module[é, K] and f : Module[E, K] be an analysis and dy- 
namic module. Furthermore, let € : E" and e E€ ‘entity(€) be an entity and 
C : Store| E", K'] and o € Ystore(®) be an abstract and concrete store. 


— Incase é@€ E then alsoe € E. Hence, lift( AE, o)= fe, ©) and lift(f)(e,o0) = 
f(e,o). Soundness follows by sound(f, f). 
— In case € € E’\ E then also e € E'\ E for all e © Yeniy(@). Hence 


ift(f)(€, 0) = f(@,@) and lift(f)(e, 0) = f(€,3). 
This lemma means that we can prove the soundness of static modules once 
for specific types of entities and kinds. Later, we can reuse the modules in a 
compound analysis with extended entities and kinds without having to prove 
soundness again. 


4.2 Changing the Type of Properties 


Next, we extend our theory to allow changing the type of properties without 
invalidating the soundness proofs of existing modules that use them. 

For example, consider we already have a pointer-static module that propa- 
gates object allocation information Property[Kyai] = Obj. We may want to track 
string information as well. This could be done with a independent string-tracking 
static module with its own lattice. However, since tracking strings is mostly iden- 
tical to tracking pointer information, such an additional module would duplicate 
significant amounts of code and require a new proof from scratch. 

Instead, we can thus reuse the same pointer-static module to propagate string 
information Str by changing its lattice to Property’ |[Kyai] = Obj x Str. However, 
this invalidates the soundness proof of the pointer-static module as it depends 
on type Property[kyai]. 

To solve this problem, we generalize the type of static modules again to be 
polymorphic over the type Property: 


Definition 7 (Parameterized Modules (Final)). We define a type of mod- 
ule that is parameterized by the type of entities E, kinds K, properties P, and 
stores S: 


f € Module[E, K, I| = VP : I, S : Store[E, K, P], E x S — S 


Interface Store[E, K, P] restricts access to entities of type E and type K and 
contains properties of type P. Interface I defines operations on properties P. 
For example, a pointer-static module may depend on the Scala-like interface 
Objects in Listing 1.1. Interface Objects depends on a type variable Value, which 
refers to possible values of variables. Function newObj creates a new object of a 
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trait Objects[Value] : 
newObj(class: Class, ctx: Context): Value 
forObj[S] (Value, S)(f: (Class, Context, S) => S): S 


object AllocationSite extends Objects [Obj]: 
newObj(class, ctx) = {(class, ctx)} 
forObj LS] (Obj(objs), Go) (£) = eee ctx) €objs f(class,ctx,a) 


object AllocationSiteAndStrings extends Objects [Obj x Str]: 
newObj(class, ctx) = ({(class, ctx)}, 1) 
forObj[S] (value, G)(£) = value match 
case (objs,_) => li leteecesesuas f(class, ctx, ®) 


Listing 1.1: Interface for different Object Abstractions 


certain class and context. Function forObj iterates over all such objects applying 
continuation f. Continuation f takes a class name, context, and store and returns 
a modified store. Interface Objects can be instantiated to support different value 
abstractions. For example, instance AllocationSite implements the interface with 
an allocation-site abstraction Obj = Obj(P (Class x Context)) which abstracts 
object allocations by their class names and a call string to their allocation site. 
Instance AllocationSiteAndStrings implements a reduced product [9] of objects 
Obj and strings Str = Constant|String], which abstracts the value of strings 
with a constant abstraction. This allows us to reuse the same pointer-static 
module to propagate string information. 

Note that certain interfaces may restrict what instances can be implemented. 
For example, an abstract domain that only approximates strings but not objects, 
cannot soundly implement operation for0bj in interface Objects. In this case, 
interfaces need to be generalized to allow a wider range of instances. 


4.3 Soundness of Parameterized Modules 


In this subsection, we define soundness of parameterized static modules and 
prove a generalized soundness composition theorem. 


Definition 8 (Soundness of Parameterized static Modules). A parame- 
terized static module f : Module[E, K, I] is sound w.r.t. a parameterized dynamic 
module f : Module[E, K, I] iff all their instances are sound: 


sound(f, f) iff VP : I,P : I, S : Store[E, K, P\,S: Store[E, K, P]. 


sound(P, P) => sound(f[|P, S], f[P, $]). 


Parameterized static modules are proven sound for all sound instances of prop- 
erty interface J. A static instance P : I is sound w.r.t. to a dynamic instance 
P : I, if all of its operations are sound. Soundness for dynamic and static in- 
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stances of interface Objects in Listing 1.1 is defined as follows: 


sound(newObj, newObj) if Ve,h,h € yh), newObj(c, h) € Yopj(newObj(c, h)) 


sound(forObj, forObj) iff Yf, f,sound(f, f) => sound(forObj( f), forObj(f)) 


Soundness of first-order operations like newObj is similar to that of static modules 
(Definition 3). Soundness of higher-order operations like forObj is proven w.r.t. 
all sound functions f. 

Finally, we generalize the soundness composition Theorem 1 to parameter- 
ized static modules. In particular, an analysis composed of parameterized static 
modules is sound if all of its modules are sound and the instance of its property 
interface is sound. 


Theorem 2 (Soundness Composition for Parameterized Static Mod- 
ules). Let ® be parameterized static modules paired with corresponding dynamic 
modules over families of entities B= U; P; E' = U; &, kinds K' = |; Ki, 
properties P, P. 


n~ n~ a 


If sound(f, f) for all (f, f) € ® and sound(P, P) then sound(’), 


A n 


where P = {(lift{E’, K’](f), liftLE’, K'A) | (f, f) € ®} 


Proof. We instantiate the polymorphic modules f, f with the compound types 


n~ 


to obtain sound/E’, K’|(lift(f), lift’, K’](f)). Then the soundness composition 
Theorem 3.4 for monomorphic modules applies. 


To summarize, in this section we explained how the type of entities, kinds, and 
properties can be changed without invalidating the soundness proofs of existing 
modules. To this end, we generalized the type of modules to be parametric over 
the type of entities, kinds, and properties. The parameterized modules access 
properties via an interface. The instances of this interface are specific to certain 
types of properties and require a soundness proof. 


5 Applicability of the Theory 


In this section, we demonstrate the applicability of our theory by first develop- 
ing four analyses in the blackboard architecture and then proving them sound 
compositionally. 


5.1 Case Studies 


We developed four different analyses in the blackboard architecture (Section 2) 
together with their dynamic semantics (Section 2.2). We proved each analysis 
sound and discuss the proofs in Section 5.2. Each analysis exercises a specific 
part of our soundness theory: 


— A pointer analysis which mutually depends on a call-graph analysis (exercises 
the part of our theory presented in Section 3). 
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— A reflection analysis which reuses the pointer analysis to propagate string 
information (exercises Section 4.2). 

— A field and object immutability analysis depending on all above analyses 
(exercises Section 4.1). 

— A demand-driven reaching-definitions analysis which demonstrates that our 
theory applies to this type of analyses. 


Our choice of analyses was inspired by similar but more complex analyses for 
JVM-bytecode implemented in OPAL, which scale to real-world applications [21, 
41]. Our analyses operate on a simpler object-oriented language with the follow- 
ing abstract syntax: 


Class = Class(ClassName, ClassName, Field* , Method”) 
Method = SourceMethod(MethodName, Var”, Stmt™) | NativeMethod(MethodName) 
Stmt = Assign(Ref, Expr) | Return(Method, Expr) | If(Expr, Stmt* , Stmt*) 
| While(Expr, Stmt”) 
Expr = Ref | New(ClassName, (Field x Expr)*) | StringLit(String) | Concat(Expr, Expr) 
| Call(Expr, MethodName, Expr*) | BoolLit(Bool) | Equals(Expr, Expr) 
Ref = VarRef (Var) | FieldRef (Ref, Field) 


The language features inheritance, mutable memory, class fields, virtual method 
calls, and Java-like reflection [35]. Reflection is modeled as virtual calls to native 
methods. We also deliberately added features such as control-flow constructs and 
boolean operations. These are ignored by the analyses, but need to be modeled 
by dynamic semantics, complicating the soundness proof of the analyses. 

We implemented and tested each analysis in Scala to ensure they are exe- 
cutable. Furthermore, we implemented and tested the corresponding dynamic 
semantics to ensure they are sensible. The code of analyses and dynamic seman- 
tics can be found in the supplementary material accompanying this paper. In 
the following, we discuss the implementation of each analysis in more detail. 


Pointer and Call-Graph Analysis A pointer analysis for an object-oriented 
language computes which objects a variable or field may point to. A call-graph 
analysis determines which methods may be called at specific call sites. Pointer 
and call-graph analyses are the foundation which many other analyses build 
upon. 

The analyses are composed from four static modules, whose dependencies 
are visualized in Figure 1. An arrow from a store entry to a module represents a 
read, an arrow in the other direction represents a write. Even though all modules 
implicitly depend on each other, they can be proven sound independently from 
each other (Section 3). This is possible because they do not call other modules 
directly, instead, all communication happens via the store. 

Module method registers each statement of a method in the store to trig- 
ger other modules. It disregards control flow as the analysis is flow-insensitive 
and hence also registers statements that can never be executed. Flow-insensitive 
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Fig. 1: Points-To and Call-Graph Static Modules 


analyses can be more performant than flow-sensitive ones, but traditional ap- 
proaches using generic abstract interpreters do not allow for flow-insentitive anal- 
yses. Module pointsTo analyzes New expressions and assignments of variable and 
field references. Module virtualCall resolves target methods of virtual calls based 
on the receiver object. Once a call is resolved, module invokeReturn extends the 
call context, assigns the method parameters and return value. Finally, it registers 
the called method as an entity in the store, triggering module method. 

The entities of the analyses are fields, statements, expressions, methods, and 
calls: 


Entity = (Field x HeapCtx) | (Stmt x CallCtx) | (Expr x CallCtx) 
| (Method x CallCtx) | (Call x CallCtx) 

Property[kval] = | Obj 

Property [Kcairrarget] = Call Target 

Obj = Obj(P(Class x HeapCtx)) 

CallTarget = CallTarget(P (Class x HeapCtx x Method x Expr*)) 


Each entity is paired with a call context or heap context, which allows to tune 
the precision of the analysis. The static modules communicate via two kinds: 
Kind «yal refers to possible values of expressions and fields and the return value 
of methods. Values are abstract objects containing information about where 
objects were allocated. Kind KcallTarget refers to possible targets of method calls. 
Call targets are sets of receiver objects paired with the target method and their 
arguments. ae 
To illustrate the analysis, Listing 1.2 shows the code of modules virtualCall 
and invokeReturn. They implicitly communicate with each other via the store but 
do not call each other directly. Module virtualCall resolves virtual method calls by 
first fetching the points-to set of the receiver reference from the store. Afterwards, 
it iterates over all possible receivers and fetches possible target methods from 
the class table. Finally, it writes the new call target to the store. Storing the 
receiver object and_argument expressions as part of the call target allows to 
reuse module invokeReturn for different types of calls. If the entity is a Call 
expression, module invokeReturn first fetches the targets of the call from the 
store. Then, it iterates over all targets, extends the call context with function 
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virtualCall (e, G) = e match 
case (call@Call (receiver, methodName, args), callCtx) => 
receiverVal = G((receiver, callCtx), kval) 
forObj(receiverVal, c) { (class, heapCtx, G’) => 
method = classTable(class, methodName) 
F’ U [(call, callCtx), Kcalitarget > newCallTarget(class, heapCtx, method, args)| 
} 


case _ =>> 6 


invokeReturn(e, G) = e match 
case (call@Call(_,_,_), callCtx) => 
targets = @((cal1, cal1Ctx), Kcaltarget ) 
forCallTarget (targets ,@){(class,heapCtx,method,args,<’) => method match 
case SourceMethod(_,params, _) => 
newCallCtx = extendCtx (call. label ,heapCtx,cal1Ctx) 
F U [(call, callCtx), kvai œ> 6’ ((method, newCall1Ctx), Kval)] 
[(p, newCallCtx), £va ++ O’((a, callCtx), £va) | (p,a) € zip(params, args)| 
[(VarRef (” this”), newCallCtx), øva ++ newObj(class, heapCtx)] 
[(method, cal1Ctx), kva +> nullPointer()] 
[(call, cal1Ctx), kva > G((method, newCallCtx), Kvai)| 
case NativeMethod(_,_,_) => a’ 
} 
case Return(method,expr) => 
© U [(method, callCtx), kva > (expr, callCtx, Kval)] 


a 


case _ => 060 


Listing 1.2: Static modules for invoking calls and resolving virtual calls. 


extendCtx, binds the parameters to the values of the arguments and variable this 
to the receiver object. Furthermore, it registers the called method as an entity in 
the store, which in turn triggers module method to process the statements of the 
called method. Lastly, module invokeReturn writes the return value of a method 
to the method entity in the store and copies it to call entities of this method. 


The modules depend on interface Objects shown in Listing 1.1 and an anal- 
ogous interface for call targets. Operations newObj and newCallTarget create 
new abstract objects and call targets. Operations forObj and forCallTarget iter- 
ate over all objects and call targets. Interface Objects also includes an opera- 
tion nullPointer not shown in the listing, which returns an empty set of object 
allocation-sites (Obj(@)). The dynamic instances are analogous except that they 
operate on singleton types. 

The dynamic modules compute a program’s heap and describe its changes 
during execution. They are analogous to their static counterparts except that 
they operate on singleton types Obj(Class x HeapCtx) and CallTarget(Class x 
HeapCtx x Method x Expr”). 

All dynamic modules combined do not cover the entire language. In particu- 
lar, there are no dynamic modules that cover reflective calls. This means, as of 
now, the dynamic semantics of reflection is undefined, and the soundness proof 
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Fig. 2: Reflection Static Modules 


only covers programs without reflective calls. We address this point with the 
following case study. 


Reflection Analysis Reflection is a language feature that allows to query infor- 
mation about classes and methods at runtime [35]. Our language supports three 
reflective methods: Methods Class.forName and Class.getMethod retrieve classes 
and methods by a string, respectively. Method.invoke invokes a method, where 
the target method is determined at runtime. Reflection is notoriously difficult to 
statically analyze soundly and precisely [30]: analyses need to approximate the 
content of the string passed into a reflective call. If the analysis cannot deter- 
mine the string precisely, it needs to overapproximate or risk unsoundness. In 
this case study, we choose the former to be able to prove the analysis sound. 

This case study demonstrates two important features of our formalization: 
First, the reflection analysis reuses all pointer and call-graph modules of the pre- 
vious section (pointsTo, method, virtualCall, and invokeReturn). It extends 
the value lattice to propagate new types of analysis information about strings. 
Even though the pointer analysis propagates new information, it does not re- 
quire any changes and its soundness proof remains valid (Section 4.2). Second, 
the reflection analysis cooperates with the call-graph static module virtualCall 
as reflective calls are regular virtual calls. For example, a call m.invoke(...) where 
variable m is of type Method is first resolved by virtual call resolution and its 
target Method.invoke is then resolved by reflective call resolution. Thus, both 
analyses add elements to the same set of call targets but can be proven sound 
independently from each other (Section 3). 

The reflection analysis extends the Obj values of the pointer analysis with 
three new types of values—Str, Class, and Method—as a reduced product [9]: 


Property|Kvai] = L | (Obj x Str x Class x Method) 
Str = L | String | T 

Class = P (Class) | T 

Method = P(Method) | T 


String values are approximated with a constant lattice. Class and method val- 
ues are approximated with a finite set of classes/methods or T. We reuse the 
modules of the pointer and call-graph analysis by implementing a new instance 
of interface Objects in Listing 1.1 for the new values. The new instance is sim- 
ilar to AllocationSiteAndStrings and iterates over all allocation-site information 
in strings, class/method values, and other objects. 

The reflection analysis adds two new modules to the existing analysis in 
Figure 1. The new modules and their dependencies are visualized in Figure 2. 
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reflection(e, o) = e match 
case (call@Call(receiver, method, _), callCtx) => 
target = @((call, callContext), Kcaittarget) 
forCallTarget (target, o) { (_,heapCtx,method, args ,a’) => 
method match 
case NativeMethod("invoke") => arguments match 
case (invokeReceiver :: invokeArgs) => 
invokeRecVal = G’((invokeReceiver, heapCtx), Kval) 
methodVal = o’((receiver, callContext) , kya) 
reflectiveTarget = methodInvoke(invokeRecVal, methodVal, 
invokeArgs) 
F’ U [(call, callCtx), Kcalitarget > reflectiveTarget] 
sh 
case __=>6 


methodInvoke(recv: Value »methodVal: Value , invokeArgs:Expr*) =methodVal match 


case (_,_,_,methods) => CallTarget({(c,h,m,invokeArgs) | 
(c,h)€recv, m € methods, m € classTable(c) }) 
case (_,_,_,1) => CallTarget({ (c,h,m,invokeArgs) | 


(c,h) € recv, method € classTable(class) }) 
case L => L 


Listing 1.3: Static modules and operations for reflection. 


Module reflection analyzes reflective calls to Class.forName, Class.getMethod, and 
Method.invoke. Module string analyzes string literals and concatenation. List- 
ing 1.3 shows an excerpt of module reflection for Method.invoke. Module reflection 
first fetches the targets of a call resolved by module virtualCall. If the call target 
is the native method invoke, module reflection matches on the arguments of the 
virtual call to extract the receiver and arguments of the reflective call target. 
Finally, it calls operation methodInvoke which returns the set of call targets. 

Operation methodInvoke is part of an interface for reflective calls. The in- 
terface_contains two other operations for retrieving class names and methods. 
methodInvoke matches on the call receiver and the method value. If the method 
value contains a finite set of methods, the operation checks if the receiver class 
has these methods and adds them as call targets. If the method value contains 
T, the operation adds all methods of the receiver class to the set of call targets. 
This over-approximates the dynamic module reflection where only one method 
is added as a call target. 

The dynamic reflection modules are analogous except that different types of 
values are alternatives. In contrast to Section 5.1, the dynamic pointer and call- 
graph modules combined with the reflection and string modules now cover the 
entire language. Thus, the analysis is sound for all programs, even those using 
reflection. 


Field and Object Immutability Analysis The analysis of this case study 
computes the immutability of objects and their fields inspired by a class and field 
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immutability analysis by Roth et al. [41]. This information is useful for assessing 
the thread safety of programs, where multiple threads have access to the same 
objects. 

This case study highlights two important features of our formalization. First, 
the core dynamic semantics of our language does not describe the immutability 
property. Therefore, we need to prove the static immutability analysis sound 
with respect to a dynamic immutability analysis. The case study demonstrates 
that the immutability concern can be encapsulated with analysis and dynamic 
modules, added modularly to the existing analysis and dynamic semantics, and 
reasoned about independently (Section 3). It is unclear how this can be achieved 
with a non-modular, monolithic analysis implementation. Second, the immutabil- 
ity analysis adds new types of entities and kinds to the store and reuses all mod- 
ules of the pointer, call-graph, and reflection analysis. Even though the reused 
modules can be called with the new entities and have access to new kinds in the 
store, their soundness proofs remain valid (Section 4.1). 

The immutability analysis adds objects (Class x HeapCtx) to the types of en- 
tities and adds kinds Kmut and Kassign for their immutability and the assignability 
of their fields: 


Entity’ = Entity | (Class x HeapCtx) 
Property[KMut] = Transitivelylmmutable | NonTransitivelyimmutable | Mutable 
Property| Assign] = Assignable | NonAssignable 


Mutable describes objects whose fields are reassigned. NonTransitivelylmmutable 
describes objects whose fields are not reassigned, but some objects transitively 
reachable via fields are mutated. Transitivelylmmutable describes objects whose 
fields are not reassigned and no transitively reachable objects are mutated. KAssign 
uses two elements for reassigned and not reassigned fields. 

The immutability analysis consists of three modules shown in Figure 3. 
Module fieldAssign sets fields f of objects o to Assignable for every _assign- 
ment of the form_x.f = e, where x may point to o. Module fieldMutability 
sets a field to Mutable if the field is assignable, to NonTransitivelylmmutable 
if it is non-assignable but one of the pointed-to objects is mutable, and to 
Transitivelylmmutable otherwise. Lastly, module objectMutability sets an object’s 
immutability to the least upper bound of the immutability of all of its fields. 

The dynamic modules are analogous except that they operate on concrete 
objects instead of abstract objects. 


Demand-Driven Reaching-Definitions Analysis As a final case study, we 
developed a demand-driven intra-procedural reaching-definitions analysis for our 
object-oriented language. This case study demonstrates that our theory lifts a 
restriction of existing soundness theories for generic interpreters. In particular, 
our theory also applies to analyses that do not follow the program execution 
order. 

The analysis computes which definitions of variables and fields reach a state- 
ment without being overwritten. The analysis is demand-driven, as it performs 
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the minimum amount of work to compute the reaching definitions of a query 
statement: the analysis only computes the reaching definitions of the query 
statement and its predecessors. Also, the analysis does not compute the entire 
control-flow graph, but only the query statement’s predecessors. | 

The analysis consists of two modules reachingDefs and controlFlow similar to 
these discussed in Section 2. Module controlFlow calculates the set of control- 
flow predecessors of a given statement by computing the set of control-flow exits 
of the preceding statement within the abstract syntax tree. For example, the 
control-flow exits of an if statement are the exits of the last statements of both 
branches. The dynamic module controlFlow computes the predecessor immedi- 
ately executed before the given statement. To this end, the module remembers 
the most recently executed statement in a mutable variable and only updates it 
if the given statement is the control-flow successor. 

The main challenge in this case study was to find a dynamic module con- 
trolFlow that closely corresponds to the static module and still computes the 
correct control-flow predecessor. With a suitable dynamic module, the sound- 
ness proof of the static module became easier. Furthermore, we validated the 
correctness of the dynamic module with several unit tests. 


5.2 Soundness Proofs of the Case Studies 


We apply our theory to compositionally prove the analyses from the previous 
section sound. The proofs can be found in the supplementary material accom- 
panying this paper. They are pen-and-paper proofs and do not make use of 
mechanization; but due to modularization, they are small and easy to verify. 

Proving each analysis sound includes (a) proving each of its modules sound 
(Definition 8), (b) proving the instances of the property interface sound, and (c) 
verifying that Theorem 2 applies. To ensure the latter, we checked that there 
are no dependencies between modules and that all communication between them 
happens via the store (Definition 1). This can be easily checked by inspecting 
the code of the modules. Furthermore, we verified that modules do not make 
any assumption about abstract domains and are polymorphic in the store (Def 
inition 7). This can be easily checked by inspecting the polymorphic type of the 
modules. 
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To prove the individual modules of an analysis sound, step (a) in the overall 
soundness proof, we use two techniques. The first uses the observation that static 
modules and their corresponding dynamic modules are often very similar, except 
for the types of entities and properties. We can abstract over these differences 
with a generic module, from which we derive both a dynamic and static module. 
Then, soundness follows immediately as a free theorem from parametricity [28]. 
In cases where abstracting with a generic module is not possible or desirable, we 
resort to a manual proof. We were able to use the first technique for all modules, 
except for method, reachingDefs, and controlFlow. For illustrating cases_where 
we need manual proofs, consider the flow-insensitive static module method of 
the pointer analysis and its corresponding dynamic module method. While we 
could potentially derive them from the same generic module, the derived static 
module would be less performant, because it would trigger the analysis of parts 
of the code, e.g., if conditions, which our current flow-insensitive module does 
not. This is an example where our approach leads to more freedom in the design 
of static analyses than the existing approach based on a generic interpreter 
(Section 6.1). 

The soundness proofs of the static modules are reusable across different anal- 
yses, because the modules can be soundly lifted to supersets of entities and kinds 
(Lemma 1). For example, the immutability analysis adds class entities, requiring 
to lift the modules of the pointer and reflection analysis. Furthermore, the sound- 
ness proofs of static modules can be reused because the proofs are independent 
of the lattices used (Definition 8). For example, the reflection analysis reuses all 
modules of the pointer analysis, extending the value lattice with string, class, 
and method information. The soundness proofs of the pointer static modules 
remain valid because they do not depend on a specific value lattice. Instead, the 
proofs of the pointer modules depend on soundness lemmas of the newObj and 
forObj operations of Objects interface. 

Finally, we consider step (b) in the overall soundness proof — the sound- 
ness proof of the instances of the property interface. These instances need to 
be proven sound manually, as the proof cannot be decomposed any further. To 
prove them sound, we proved each of their operations sound. For the pointer 
analysis we needed to prove 7 operations sound, for the reflection analysis 6 
operations, for the immutability analysis 6 operations, and for the reaching- 
definitions analysis 0 operations. Of these 19 operations, 13 could be proven 
sound trivially, requiring only a single proof step after unfolding the defini- 
tions. The remaining 6 operations required more elaborate proofs with multi- 
ple steps and case distinctions. These include forObj from the pointer analysis, 
classForName, getMethod, and _methodInvoke from the reflection analysis, and 
getFieldMutability and joinMutability from the immutability analysis. 


6 Related Work 


In this section, we discuss work related to compositional and reusable soundness 
proofs as well as to modular analysis architectures. 
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6.1 Theories for Compositional and Reusable Soundness Proofs 


All works discussed in this subsection, including our own, build upon the theory 
of abstract interpretation. Abstract interpretation is a formal theory of sound 
static analyses, first conceived by Cousot et al. [12] but since then has found 
wide spread adoption in academia and industry [13, 16, 22, 25, 33, 44]. Abstract 
interpretation defines soundness of static analyses but does not explain how 
soundness can be proved. As we elaborate in the introduction, soundness proofs 
of practical analyses for real-world languages are difficult because they relate 
two complicated semantics often described in different styles. Proof attempts of 
such analyses often fail due to high proof complexity and effort. Furthermore, 
existing proofs are prone to become invalid if the static or dynamic semantics 
change and reestablishing proofs is laborious and complicated. 

Domain constructions, such as reduced products and reduced cardinal pow- 
ers [12], combine multiple existing abstract domains to improve their precision. 
They can be used to compose the soundness proof of operations on the abstract 
domain, e.g, primitive arithmetic, boolean, or string operations. However, they 
cannot be used to compose the soundness proof of the analysis of statements, e.g., 
assignments, loops, or procedure calls. In contrast, the blackboard architecture 
is capable to compose soundness proofs of both of these types of operations. 

Darais et al. [14] developed a theory for soundness proofs, in which the static 
and dynamic semantics are derived from a small-step generic interpreter that 
describes the operational semantics of the language without mentioning details 
of static or dynamic semantics. The small-step generic interpreter is instantiated 
with reusable Galois transformers that capture aspects such as flow- or path- 
sensitivity and allow to change an existing analysis while preserving soundness. 
Galois transformers can be proven sound once and for all and their soundness 
proofs are reusable across different analyses. However, the approach does not 
compose soundness proofs of static semantics derived from the generic inter- 
preter. 

Keidel et al. [28] developed a theory for big-step abstract interpreters, deriv- 
ing both the static and dynamic semantics from a generic big-step interpreter. 
The theory enables soundness composition [28, Theorem 4 and 5] if the generic 
interpreter is implemented with arrows [23] or in a meta-language which en- 
joys parametricity. But there is no theory how parts of soundness proofs can be 
reused between different analyses. Keidel et al. [27] later refined the theory by 
introducing reusable analysis components that capture different aspects of the 
language such as values, mutable state, or exceptions and are described with 
arrow transformers [23]. While components can be proven sound independently 
from each other, their composition requires glue code, which needs to be proven 
sound. Furthermore, the composition creates large arrow transformer stacks — 
that, unless optimized away by the compiler, may lead to inefficient analysis 
code. For example, a taint analysis for WebAssembly developed by using the 
approach depends on a stack of 18 arrow transformers.Eliminating the over- 
head of an arrow transformer stack of this size requires aggressive inlining and 
optimizations causing binary bloat and excessive compile times. 
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Bodin et al. [5] developed a theory of compositional soundness proofs for 
a style of semantics called skeletal semantics, which consists of hooks (recur- 
sive calls to the interpreter), filters (tests if variables satisfy a condition), and 
branches. The dynamic and static semantics are derived from the same skeleton. 
Also, soundness of the instantiated skeleton follows from soundness of the dy- 
namic and static instance [5, Lemma 3.4 and 3.5]. However, their work does not 
describe how proofs can be reused across different analyses. 

To recap, in all theories above the static and dynamic semantics must be 
derived from the same generic interpreter. This restricts what types of analyses 
can be derived. In particular, the static analysis must closely follow the pro- 
gram execution order dictated by the generic interpreter and it is unclear how 
static analyses can be derived that do not closely follow the program execution 
order. For example, backward analyses process programs in reverse order, flow- 
insensitive analyses may process statements in any order, and summary-based 
analyses construct summaries in bottom-up order. Our work lifts the restric- 
tion that static and dynamic semantics must be derived from the same artifact. 
static modules and corresponding dynamic modules must follow the blackboard 
architecture style, but else do not need to share any commonalities. This gives 
greater freedom as to which types of analyses can be implemented. For exam- 
ple, the blackboard analysis architecture has been used in prior work to develop 
backward analyses [17], on-demand/lazy analyses [19,41], and summary-based 
analyses [21]. We also demonstrated in Section 5.1 that our theory applies to a 
demand-driven reaching definitions analysis. It is unclear how such an analysis 
can be derived from a generic interpreter. 


6.2 Modular Analysis Architectures 


These architectures describe how to implement static analyses modularly. Mod- 
ular analysis architectures are a necessary requirement to develop theories for 
compositional and reusable soundness proofs. The theories give formal guaran- 
tees about proof independence, composition, and reuse. 

Our work formally defines the blackboard analysis architecture used in the 
OPAL framework [15,21]. In the past, OPAL has been used to implement state- 
of-the-art analyses for method purity [19], class- and field-immutability [41], and 
call-graphs [40] for Java Virtual Machine bytecode. Furthermore, OPAL features 
escape analyses and a solver for IFDS analyses [21] as well as a fixpoint algorithm 
that parallelizes the analysis execution [20]. 

Prior to the work presented in this paper, no formalization of the blackboard 
analysis architecture and no theory for its soundness existed. Our formalization 
captures the core of the OPAL framework, while deliberately ignoring imple- 
mentation details. For example, our formalization does not describe the fixpoint 
algorithm and the order in which it executes static modules to resolve their 
dependencies. Proving the fixpoint algorithm correct is a separate concern com- 
pared to proving analyses sound, which is the focus of our formalization. That 
said, our formalization covers a variety of OPAL’s features described by Helm et 
al. [21]. For example, OPAL supports default and fallback properties for missing 
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properties in the store. Fallback properties can be described by our formaliza- 
tion by adding them to the initial store passed to the fixpoint algorithm. We 
deliberately leave out default properties, which are an edge case in OPAL to 
mark properties not computed, e.g., because of dead code. They could be added 
to our formalization by extending analyses with a second set of static modules 
to be executed after the fixpoint is reached. Furthermore, OPAL supports opti- 
mistic analyses which ascend the lattice and pessimistic analyses which descend 
the lattice during fixpoint iteration. Both of these are covered by our formaliza- 
tion which describes analyses as monotone functions that ascend or descend the 
lattice. However, we deliberately do not cover OPAL’s mechanisms for allowing 
interaction between optimistic and pessimistic analyses, another edge case. 


Configurable program analysis (CPA) [4] is a modular analysis architecture 
that describes analyses with transfer relation between control-flow nodes. CPAs 
can be systematically composed with reduced products. Furthermore, soundness 
of a component-wise transfer relation follows directly from soundness of its con- 
stituents. However, it is unclear how soundness proofs of primitive CPAs can be 
composed or how proof parts can be reused across analyses. 


Doop |7] is a framework which describes analysis with relations in Datalog. 
Each relation is defined as a set of rules. These rules can be modularly added 
or replaced, without requiring changes to other rules. While individual analy- 
ses in Doop have been proven sound [43], the proofs are not compositional or 
reusable. In particular, if one rule changes, the proof becomes invalid and needs 
to be reestablished. This is because the proof reasons about soundness of all 
rules at once instead of individual rules or relations. The IncA framework [45] 
also describes analyses in Datalog, but allows relations over lattices instead of 
only sets. However, no soundness theory for its analyses exists. Similar to IncA, 
the Flix framework [37| describes analyses with lattice-based Datalog relations 
and functions. Flix proves individual functions sound with an automated the- 
orem prover [36]. While an automated theorem prover reduces the proof effort 
and increases proof trustworthiness, there is no guarantee that the automated 
theorem prover is able to conduct a proof. Furthermore, the automated theorem 
prover does not establish a soundness proof of Datalog relations. 


Verasco [26] is a modular analysis for C#minor [32], an intermediate lan- 
guage used by the CompCert C compiler [33]. Verasco is proven sound with 
the Coq proof assistant [3]. The soundness proof of the abstract C7minor se- 
mantics is independent of the abstract domain, which makes the proof reusable 
for other abstract domains. However, the abstract semantics is proven sound 
w.r.t. the standard concrete semantics. Thus, the proof cannot be reused for 
abstract semantics which approximate non-standard concrete semantics, such as 
information flow analyses [2] or liveness analyses [11]. 


Several other modular analysis architectures [24,31,42] do not have formal 
theories for soundness. 
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6.3 Monolithic Soundness Proofs 


In this subsection, we compare compositional and reusable soundness proof the- 
ories to ad-hoc monolithic proofs and discuss their trade-offs. 

Monolithic soundness proofs consider the entire analysis and dynamic se- 
mantics as a whole. This complicates the proof because there is no separation 
of concerns to manage the complexity of modern programming languages. Fur- 
thermore, monolithic soundness proofs are harder to maintain. In particular, 
whenever the analysis needs to be updated to support a new version of the lan- 
guage, or whenever the analysis is fine-tuned to improve precision and scalabil- 
ity, the soundness proof becomes invalid and needs to be reestablished. However, 
reestablishing the soundness proof is difficult because it is unclear which parts 
of the proof have become invalid and need to be updated. In contrast, compo- 
sitional soundness proofs narrow the proofscope to individual modules, which 
decreases the proofs’ complexity. Furthermore, compositional soundness proofs 
are easier to maintain as changes to individual modules only invalidate their 
particular soundness proof, while the proofs of other modules remain valid. 

The main benefit of monolithic soundness proofs over compositional proofs 
is that analyses may be proven sound with respect to existing formal dynamic 
semantics. However, often no suitable formal dynamic semantics exists and anal- 
yses still have to be proven sound with respect to customly defined or modified 
dynamic semantics. For example, HornDroid [8] is proven sound with respect to 
a custom instrumented JVM small-step semantics and Jaam® is proven sound 
with respect to a custom JVM semantics in form of an abstract machine [22]. 
Furthermore, analyses of properties not present in standard language semantics 
need to be proven sound with respect to instrumented dynamic semantics. For 
example, a static taint analysis needs to be proven sound with respect to an 
instrumented dynamic semantics with taint information. In contrast, composi- 
tional soundness proofs require a one-time cost of formalizing a modular dynamic 
semantics for a language. Once this is done, several analyses can be proven sound 
with respect to this dynamic semantics. Furthermore, the dynamic semantics can 
be modularly extended to describe new aspects such as taint information. 


7 Future Work 


In this section, we discuss limitations of our work and how these limitations can 
be addressed in the future. 

First, our soundness theory requires that static analyses and dynamic se- 
mantics are described in the blackboard analysis architecture. It is unclear how 
easily existing analyses and dynamic semantics be adapted to the architecture. 
In Section 2.2, we showed how existing small-step dynamic semantics can be 
described as a module and Helm et al. [21] implemented a wide range of static 
analyses in the architecture. In the future, we want to investigate how other 
styles of static and dynamic semantics can be adapted to the architecture. 


6 https: //github.com/Ucombinator/jaam 
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Second, our soundness theory requires that all static modules are sound. 
However, in practice static analyses are deliberately unsound due to complicated 
language features [34]. In the future, we want to investigate how the blackboard 
analysis architecture can be used to localize unsoundness. Specifically, unsound 
analysis results could be tagged with the name of the module that produced 
them. All results derived from unsound results then propagate the tags. This 
way, it is always clear which results are potentially unsound and which modules 
caused unsoundness. 

Lastly, our work has focused on soundness, i.e., analyses do not produce 
false-negative results. A complementary property to soundness is completeness, 
i.e., analyses do not produce false-positives results. No false-positive results are 
especially important if analyses produce warnings that are to be inspected by 
developers. In the future, we want to investigate if our theory can be extended 
to prove completeness of static analyses. 


8 Conclusion 


In this work, we developed a theory for compositional and reusable soundness 
proofs for static analyses in the blackboard analysis architecture. The blackboard 
analysis architecture modularizes the implementation of static analyses with 
analyses composed of independent static modules. We proved that soundness of 
an analysis follows directly from independent soundness proofs of each module. 
Furthermore, we extended our theory to enable the reuse of soundness proofs 
of existing modules across different analyses. We evaluated our approach by 
implementing four analyses and proving them sound: A pointer, a call-graph, a 
reflection, an immutability analysis, and a demand-driven reaching definitions 
analysis. 


9 Data Availability 


The implementation of the case studies and proofs are provided as an artifact 
available at https://doi.org/10.5281/zenodo.10418484. 
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Abstract. Exception handling is a key feature in modern programming 
languages. Exceptions can be used to deal with errors, or as a means 
to control the flow of execution of a program. Since they might unex- 
pectedly terminate a program, unhandled exceptions are a serious safety 
concern. We propose a static analysis to detect uncaught exceptions in 
functional programs, that is defined as an abstract interpreter. It com- 
putes a description of the values potentially returned by a program using 
a novel abstract domain, that can express inductively defined sets of 
values. Simultaneously, the analysis infers the possibly raised exceptions, 
by computing in the abstract exception monad. This abstract interpreter 
has been implemented as an effective static analyser for a large subset of 
OCaml programs, that supports mutable data types, the OCaml module 
system, and dynamically extensible data types such as the exception type. 
The analyser has been evaluated on several hundreds of OCaml programs. 


Keywords: Static Analysis - Exceptions - Higher-Order Programs - 
Abstract Interpretation - Abstract Domain for Trees 


1 Introduction 


Programs that run in critical environments need to comply with strong safety 
guarantees. The minimal guarantee one expects for critical software is the absence 
of runtime failures. Sound static analyses can provide such guarantees statically, 
for every possible execution of a program, and in a fully automatic manner. 

The static typing discipline found in the ML family of languages is such a 
static analysis technique, that brought strong safety guarantees to programs at a 
very low cost: well-typed programs cannot “go wrong” [48]. This soundness theorem 
for well-typed ML programs, however, does not preclude programs from abruptly 
ending with uncaught exceptions. Several analyses for ML-like languages have 
been developed to detect such undesirable behaviours, that were either leveraging 
type and effect systems [38]54], or that were based on variants of control-flow 
analyses or set constraints [681671415166]. The recent success of algebraic effects 
and their introduction in popular languages such as OCaml [37] has renewed the 
interest in the static detection of uncaught exceptions and effects. 
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Analysing uncaught exceptions in ML is a difficult problem, because data 
flow and control flow are interdependent. This is not only due to the first class 
nature of functions, but also due to the first class nature of exceptions themselves, 
e.g., they can be taken as parameters, recorded in data structures or in mutable 
references. Furthermore, exceptions can carry any value as argument—including 
functions—and new exceptions can be dynamically generated at runtime. 

In this paper, we propose a static analysis for a higher-order language, in 
which exceptions are first-class values. The analysis is based on the abstract 
interpretation framework [9]. It is a forward value analysis that infers which values 
any program point can compute, and which exceptions they might raise. For this 
purpose, we introduce a novel abstract domain that can represent recursively 
defined sets of values. We define a widening operator for this abstract domain, 
that is responsible for finding recursive generalisations of solutions. 

Our analysis leverages this abstract domain to represent both possible values 
and exceptions, thanks to the abstract exception monad. This monad—that can 
also be used as an abstract domain—is an abstraction of the exception monad, 
that collects all values and exceptions. 

We define our analysis as a big-step monadic interpreter, written in the open 
recursive style, that was emphasised in the “Abstracting Definitional Interpreter” 
approach [II]. Then, we obtain an effective analyser by applying a generic, 
dynamic fixpoint solver [6635924230]. We prove that our analysis is sound, 
under the soundness assumption of the fixpoint solver. 

We extend the analysis to handle a large subset of the OCaml language. In 
particular, it supports the dynamic creation of exceptions, mutable state, modules 
and functors. The analysis is so far limited to sequential programs that do not 
perform system calls, do not use the Gc or Obj modules, and do not employ 
recursive modules, general recursive definitions of values, objects, classes, arrays, 
or floats. We implemented an OCaml prototype for this analyser. It reports the 
possibly thrown exceptions and an over-approximation of the data they carry, 
along with an abstraction of the call trace that led to the program point where 
the exception was raised. We discuss some implementation choices, and evaluate 
the precision and performance of our analyser on 290 programs, that include 
examples from the literature and from the OCaml compiler’s test suite. 


2 Overview 


Let us consider the classic example of the factorial function, as written below in 
a continuation passing style. 


let rec fact_cont n i k = 

if i >= n then k i else 

fact_cont n (i + 1) (fun x -> k (x * i)) 
let fact n = fact_cont n 1 (fun x -> x) 
let result = fact 5 


The fact_cont function recursively calls itself with increasing values of its 
parameter i, until the value n is reached. 
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We are interested in finding which values (and exceptions) this program might 
return. To answer this question, we first need to find the possible continuations 
the function fact_cont can be called with, and, importantly, we need an abstract 
domain in which we can express this set, or an over-approximation thereof. 

With the abstract domain that we introduce in we can express such a set 
as the following abstract value: 


poo. {funs : {(Av. a) {}; (Aa. k (x xi)) > {i > {ints : [1, +00]};k > a}; h} 


This abstract value represents a recursively-defined set—as indicated by the u 
constructor—that is locally named a. This set is composed of function closures, 
that can be either the identity function, or the function Az. k (x * 7), considered 
in an environment where the variable ¿ is bound to an integer that is greater or 
equal to 1, and where the variable k is recursively bound to the local variable a, 
i.e., to a value of the set we are defining. 
Our abstract domain can also express structural invariants on data, such as 
the one for red-black trees [52], that forbids red nodes from having red children: 
E: (); 
{constructs : {E : (), B : (a, {ints : T}, a)}}; 
ua. < constructs: 4 R: | {ints: T}, ; 
{constructs : {E : (), B : (a, {ints : T}, a)}} 
B : (a, {ints : T}, a) 


Our abstract domain bears a strong similarity with the theory of equi-recursive 
types [56], in the sense that recursion is a core aspect of our definition. However, 
it differs from recursive types, as function types are absent: sets of closures are 
used instead. Moreover, it is parameterised by a non-relational abstract domain 
used to represent integers values—which is not possible with simple type systems. 

We leverage our abstract domain and define a static analysis for a call-by-value 
A-calculus with pattern matching, exception handling, and first-class exceptions 
($). In this language, the order of evaluation is made explicit by let bindings, 
and pattern matching is exhaustive and non-ambiguous [8]. These requirements 
drastically simplify the semantics of programs and their analysis. The analysis is 
defined as an abstract interpreter that performs a forward value analysis ($5). 

Based on this small abstract interpreter, we sketch (q6) several extensions 
that we implemented to obtain a static analyser for a subset of OCaml programs. 
The implementation uses an intermediate language that is close to the one of 
into which we translated the OCaml typed abstract syntax tree. We evaluated 
the precision and performance of our analyser on 290 OCaml programs, written 
in a variety of styles (direct, CPS, monadic, etc.). We discuss these experimental 
results (q7), cover related work ($8), and finish with conclusive remarks (99). 


3 A A-calculus With Exceptions 


We introduce as an intermediate language a A-calculus with pattern matching 
and exception handling. Its syntax resembles the monadic normal form, where 
the order of evaluation is made explicit with let bindings. 
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Definition 1. Given C a set of constructor symbols, we give the following in- 
ductive definition of patterns p,q, and expressions t, u,r: 


p, qEP::=x | n | c(pi,...,pe) | pi tpe | p\@ 


t,u, rET:=x |n | zı opxe | c(£1,..., £k) 
| uf.-àx.t | fy | letx=tinu | raise x 
| match ¢ with pı > ui|--- | pn > Un 


| dispatch t with val z > ulexny>r 


where n is a constant integer, c is a constructor of C, op is a binary operation 
on integers, and where the pattern q cannot contain any complement pı \ p2. 


We consider a pattern syntax and formalism inspired from [8]. The pattern 
disjunction p+q matches any value matched by p or q, and the pattern complement 
p \ q matches any value that is matched by p but not by q. 

As in the OCaml typed AST, variables carry a type. We may write xz, to 
denote that the variable x is of type T. Patterns are linear, i.e., sub-patterns of 
constructor patterns cannot share variables. All functions are recursive by default. 
If f does not occur in the expression t, then we write Ax. t instead of u f. Ax. t. 

The values of this language are integer constants, constructors applied to 
values, and function closures, that contain an environment of values: 


vEVi=n | c(vi,...,vn) | (E, uf.àx.t) where dom E = fv(pf. Ax. t) 
EeE:=(|] | E,x>v 


Patterns induce a matching relation over values, that is described, with regard 
to a given environment E, by recursion on patterns: 


T Krp v <=> E(r)=v 
C(P1,---;Pn) Ke clui... Un) <> Aridi <e vi 
pt+q <r v 4< = P~Kev V GeV 
pD\q “sz v <> p~Xrv A qKv 


We say that a pattern p matches a value v, denoted p < v, iff there exists an 
environment E such that p<, v. In such case, we write E(p ~ v) the smallest 
environment such that p <j,2) V- 

Thanks to this pattern-matching formalism, we can focus on the class of 
programs where pattern matching is exhaustive and non-ambiguous, i.e.: In a 
term match t with pı > ui|--- | pn = Un where t : T, we require that for any 
value v : 7, there exists a unique 1 < i < n such that p; <v. The work presented 
in [8] shows how to disambiguate patterns, i.e., how to make any pattern match 
non-ambiguous. We restrict ourselves to non-ambiguous patterns, because it 
simplifies both the dynamic semantics and the analysis of programs. 

We present in Figure [Ia call-by-value big-step semantics for our language. 
We write t ya, v to denote that the expression term t reduces to the value v, and 
we write t len V to denote that the reduction of t raises an exception evaluated 
as v. In this language, any value can be raised as an exception. The evaluation 
rules are mostly standard. We briefly explain the rules for match and dispatch. 
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——_—___— VaR ——_—_ JNT Op 
EF z Wa E(x) EE nya n EF x1 op £2 vai (x1) [op] E(x2) 
RAISE CONST 
EF raise & Jen E(x) EF e(xz1,..., £k) Yva c(E(21),..., E(£k)) 
WA | Bly) = (E', uf. Xet) 
E = E|,, i E, f= E(y), xı E(z)F tim v 
f HFA .t) Ea f (y) ( ) 4% APP 
EF uf. Ax.t va (E , uf. Ax. t) EFyzļļmv 
E F ti lha V1 E, trv tem ve EF ti Vexn V 
- LET - LETRAISE 
E F let x = tı in te Ibm ve EF let x = tı in te ex V 
EF t Wai v Pi <v E, E(pi < v) F ui em v l<i<n 
- ; MATCH 
E F match t with pı > ui] +--+ | pn > Un Im v 
EF tlen v 
- MATCHRAISE 
E F match t with pı > u|- | pn > Un Yen V 


Ettlmv E, 8mvE Um Jm v 
E F dispatch t with val £ya = Uyali | EXN Lexn => Uexn Ym’ V 


7 DISPATCH 


Fig. 1. Big-step semantics. 


The non-ambiguous pattern-matching simplifies the semantics of the term 
match ¢ with pı = ui|--- | Pn = Un, as only one pattern can match the value 
of t, and thus only one branch is considered during the evaluation. 

The rule DISPATCH deals with exception handling: the evaluation of the term 
dispatch ¢ with val Zvai = Uval | EXN Lexn => Uexn first evaluates t. If t reduces to a 
value, then the value branch uya is evaluated. Otherwise, if t raises an exception, 
the exception branch uexn is evaluated. In both cases, the value or the exception 
is added to the environment of the corresponding branch. 


4 An Abstract Domain for Regular Sets of Values 


In this section, we define an abstract domain that is able to represent inductively 

defined sets of values of our programming language. It is parameterised over a 

non-relational, numeric abstract domain I, that provides a concretisation function 

y :1— 9(Z), a test for the abstract inclusion pre-order, and operations for 

union, intersection and widening, with the standard soundness conditions. For 

instance, the soundness of abstract union is stated: y(l1) U yr(l2) © (lı Ur le). 
The definition of our abstract domain follows: 


Definition 2 (Abstract values). 
A € A x= {ints : l; constructs: C; funs:F} | œa | ua.A (Abstract value) 


lel ::= any numeric abstract domain (Abstract integers) 
C n= {en (A,...,A)} | T (Abstract constructs) 
Fo:={uf.Av.trH E} | T (Abstract closures) 
EcE:= {rr A} (Abstract environment) 
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An abstract value, written A, describes which integers it denotes (in the 
field ints), and which values whose head is a constructor it denotes (in the field 
constructs), and which function closures it denotes (in the field funs). The integer 
values are described by a numeric abstract domain that is taken as parameter. 

The constructed values are described by a map whose keys are the possible 
head constructors of the values, and whose data are tuples of abstract values, 
that denote the possible values for all the arguments of that constructor. The 
constructed values might also be described by T, which means that the head 
constructor could be any constructor, and the arguments may be any value. 

Similarly, the possible function closures are described by a map that associates 
possible codes of the function to abstract environments. The environments map 
free variables of the corresponding function code to abstract values, denoting the 
possible concrete values of these variables. The closures might also be described 
by T, to represent any closure made from any function code with any environment. 

Finally, we can construct recursive sets of values through the use of variables a, 
that are introduced by the u constructor of fixpoints. 

The bottom value is {ints :_L; constructs : {}; funs : {}}, and the top value is 
{ints : T; constructs: T; funs : T}. We may completely omit some of the fields 
(ints, constructs or funs) when they are associated with a bottom value. 

This informal explanation is formalised in the concretisation function: 


Definition 3 (Concretisation). Assume I is a finite mapping from vari- 
ables to abstract values. The concretisation yr : A — (V) is defined by 
yr {ints : l; constructs : C; funs : F} = 7(l) Uyr(C) U yr (F), where: 


yrla) = Tr(a) 
yr (wa.A) = lfpc (AS.Yr,a:5(A)) 
(1) = y (1) 
{c(ui1,..-,Un) |CECAYL Sin, vu EeV} ifC=T 
yr(C) = (c (A1,..., An) EC 
c(v1,..-;Un) ; 
ANYI <i<n,u; E€ yr(Ai) 
q- [UE ufri) | Be EAteT} fF=T 
EES {(E, uf. rx.t) | (uf. àx.t o E) Ee FA E € yr(E)} otherwise 
yr(E) = {E | dom E = domEA Vz € dom E, E(x) € yr(E(x))} 


otherwise 


The definition is justified by the fact that the function AS.Yr,a:g(A) is mono- 
tonic, and thus has a least fixed point, thanks to the Knaster-Tarski theorem. 
This is formalised by the following lemma: 


Lemma 1. Consider the inclusion order C on p(S), and its pointwise extension 
on environments I. For any abstract value A, the function AI.yr(A) is monotonic. 


The fact that our abstract values may represent sets of values that might not 
all have the same types may seem surprising, since our goal is, ultimately, to 
analyse strongly typed programs. The crux of the explanation lies in the fact that 
our abstract domain can only represent regular sets of values. If we restricted our 
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abstract values so that they represent homogeneously typed values, it would be 
difficult to represent sets of values that are induced by a non-regular recursive 
type—like the type of finger trees [23|—or by generalised algebraic data types 
(GADTs). Indeed, one would need to find an over-approximation of such sets, and 
we would often approximate with the T abstract value. The ability to describe 
regular sets of values that may not have all the same type gives us more freedom, 
and allows to find more precise approximations. For instance, we can represent 
finger trees as a recursive set whose values are either trees or fingers, although 
trees and fingers have distinct types. In practice, the T value is never produced. 

We write Ai[a + A2] to denote the capture avoiding substitution. We write 
(A) for yp (A), i.e., when the environment is empty. 

The unwinding of fixpoints preserves the concretisation of abstract values. 


Lemma 2 (Unwinding). y(ua.A) = y(Ala + pa.A]) 


To define several operations on abstract values, we restrict them to well-formed 
values, using the standard contractiveness property for recursive types [I6]: 


Definition 4 (Contractiveness). An abstract value A = 8)....UBy.A’ is 
a-contractive ifn > 0 and A’ does not start with u and is not the variable a. 


Well-formedness requires that fixpoints must be contractive, that constructors 
are used with the correct arity, and that the environment in closures only define 
bindings for the free variables of the functions. 


Definition 5 (Well-formedness). An abstract value A is well-formed when 
the following conditions are satisfied: 
— For any pa.A’ that occurs in A, the abstract value A’ is a-contractive, and 
— For any c> (A1,..., An) that occurs in A, the arity of c is n, and 
— For any uf.Ax.t +> E that occurs in A, domE = fv(u f. Az. t). 


Well-formedness rules out the abstract value a.a, whose concretisation is the 
empty set. Well-formedness is preserved by substitution, provided contractiveness 
for the substituted variable is satisfied. This ensures that unwinding fixpoints 
preserves well-formedness. In the rest of this article, we only consider closed, 
well-formed abstract values. 

For any abstract value A, we can retrieve the subset of integer values (respec- 
tively, constructed values, or function closures) by unwinding the top-level ps if 
there are any, and eventually getting the ints field (respectively, constructs, or 
funs). This is formalised in the following definition for projection on integers: 


Definition 6 (Projection on integers). The projection on integers of a well- 
formed abstract value A, written A.ints, is defined as follows: 


{ints : l; constructs : C; funs : F} ints = | 
(ua.A).ints = (Ala + ua.A]).ints 


The definition for projection is well founded, thanks to the contractiveness of 
us: only a finite number of unwindings is necessary. The projections A.constructs 
and A.funs are defined in a similar way. Projection on integers is sound, as it 
over-approximates the set of integers an abstract value contains: 


398 P. Lermusiaux, B. Montagu 
Lemma 3 (Soundness of projection on integers). y(A) MZ C 7(A.ints) 


Projections for constructors and closures enjoy similar soundness properties. 


4.1 Inclusion, Union and Intersection 


Following the methodology employed in the context of recursive subtyping, we 
define the inclusion relation between abstract values as a co-inductive relation. 


Definition 7 (Abstract inclusion). The inclusion between abstract values, 
written A; E Ag is defined as a co-inductive relation by the following rules: 


Ayla p= ua.Aı] L Ao Ai £ uB.A Ai L Agla — ua.Aa] 
ua.Aı E A2 Ai = pa.A2 
lı Cı le Cı Ec Co Fı Cp Fo 
{ints : l4; constructs : C4; funs : F1} E {ints : lọ; constructs : Co; funs : Fo} 


GEcT F, Ce T 
Vier (Aii,---,Atn)) € Cı, V(uf. Ax. t > E1) € Fy, 
(ch (Agi,---,Aan)) € Co, J(u f. Av. tO E2) E Fo, 
V1 < a < n, Ai L Agi Yr € dom E}, E(x) L Eo(x) 


Ci Ec Co F, Cf Fo 


In this definition, the relation E; is provided by the abstract domain on integers. 
The inclusion relation unfolds fixpoints when necessary, and otherwise compares 
each field (integers, constructed values, closures) separately, by treating the 
finite maps for constructed values and closures as disjunctions, i.e., by using 
the standard Hoare ordering. In practice, the inclusion test is implemented by 
transforming abstract values into graphs that resemble tree automata: each graph 
node corresponds to a sub-term of an abstract value, and -nodes create cycles. 
Then, it suffices to check whether one automaton simulates the other [IJ3I{16]. 


Lemma 4 (Inclusion is a pre-order). The inclusion between closed, well- 
formed abstract values is a pre-order, i.e., a reflexive and transitive relation. 


The definitions for abstract union and intersection are defined in the compan- 
ion research report [34] in a similar way, as co-inductive relations that unwind 
fixpoints when needed. 

The abstract operations enjoy the expected soundness properties: 


Lemma 5 (Soundness of abstract operations). For any closed, well-formed 
abstract values Ay and Ag: 

— A; E Ao implies y(A,) C y(Az), and 

— (A1) U 7(A2) © (Ar U A2), and 

— (A1) N (Az) © (Ai D A2). 
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The proof of Lemma|5|crucially relies on Lemma f] that proves that unwinding 
a recursive value preserves its concretisation. 

Union and intersection are implemented by translating the values into graphs, 
on which union and intersection are easily computed. Then, we transform them 
back into trees with nodes. Our implementation exploits the locally nameless 
representation [5], where bound variables are encoded as de Bruijn indices. We 
leverage this canonical representation by hash-consing values and memoising the 
operations [13]. This has proved essential to obtain acceptable performance. 


4.2 Widening 


The widening, written A, VAg, is a binary operator on abstract values that over- 
approximates the union of abstract values, and is used to approximate the Kleene 
fixpoint iterations. The role of the widening is central in abstract interpretation, 
as it serves two purposes. Firstly, the widening must find generalisations of 
abstract values, in order to find invariants. This part impacts the precision of 
the analysis, and relies on heuristics. Secondly, it must ensure the termination of 
the analysis, by enforcing a stability property: every widening chain must reach 
a limit in finite time. This part impacts the performance of the analyser. 

In our abstract domain, the widening operator is responsible for finding 
regularities in abstract values and for creating u nodes. A similar idea was used 
in the analysis of Prolog programs using type graphs [22], that are trees that 
contain cycles. Our widening draws inspiration from type graphs. 

We now give the informal procedure to compute the widening of two abstract 
values A; and Ag. It operates in two phases. The first phase proceeds as follows: 

1. Compute the union Aj of A; and Ao where the widening of the numeric 
abstract domain is used, instead of the standard union. This ensures that 
the numeric parts of abstract values won’t grow indefinitely. 

2. Compute Anew, which is a minimised version of A19. Minimisation is performed 
by an algorithm on tree automata, that produces a semantically equivalent 
abstract value, and whose size is smaller. 

3. Compare the Anew and A; (viewed as trees): 

— If the height of Anew is not greater than the height of A1, return Anew; 
— If, for each construct and each code of closures, the maximal number of 
occurrences in each tree path of Anew is less than those occurrences in 
Aj, or a user-provided threshold, return Anew; 
— Otherwise, go to the shrinking phase. 
Steps 2 and 3 allow the size of abstract values to grow enough, before a shrinking 
phase starts. In practice, this is important to find precise invariants. 

The shrinking phase, which takes inspiration from the widening operation of 
type graphs, tries to shrink Anew, by introducing u nodes at appropriate positions 
to “fold the abstract value on itself”. It proceeds as follows: 

1. Find clashes between A, and Anew, i.e., nodes that are reachable through the 
same path (possibly unwinding u nodes) in the two trees, and such that: 

— Either, the two nodes have different sets of head constructors or codes of 
functions: this means that the two nodes might differ semantically. 
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— Or, the two nodes have different depths in the two trees: this means that 
some path was followed through a u-unwinding. 
2. If no clash is found, then return Anew- 
3. If a clash is found, then we try to create a cycle in Anew by merging the 
clashing node with one of its ancestors: 

— We search for the closest ancestor of the clashing node that is semantically 
larger in the sense of the pre-order. If there is such an ancestor, then we 
merge it with the clashing node, thus creating a cycle. 

— If no such ancestor exists, we search for the closest ancestor that has at 
least the same head constructors and function codes as the clashing node, 
then we merge it with the clashing node too. 

— If no such ancestor exists, then we return Anew unchanged, which allows 
the abstract values to grow. 

We repeat this operation until no clashing node remains, or until a maximal 
number of iterations is reached. In the latter case, we truncate Anew, i.e., we 
replace some nodes with T, so that it has the same height as A4. 


In practice, we could not find any case where the final truncation is needed. We 
have observed that our widening operator finds precise generalisations in practice. 


5 An Abstract Interpreter to Detect Uncaught Exceptions 


To design our abstract interpreter, we took inspiration from the “Abstracting 
Definitional Interpreter” approach [I]. This methodology prescribes to derive 
an abstract interpreter from a concrete big-step interpreter that computes in a 
monad, that is a parameter of the interpreter. Furthermore, the methodology 
fosters the use of the open recursive style: the interpreter should be a function that 
takes as extra parameter the function that was intended to be called recursively. 
The first aspect—being parameterised by a monad—is motivated by the fact 
that one could use a monad that computes over abstractions of values. In 
we present a monad that is an abstraction of the exception monad. It is also an 
abstract domain, and is therefore well suited to define an abstract interpreter. 
The second aspect—using open recursive style—permits the use of dynamic 
fixpoint solvers [59/63[12/24]6]30]. Such solvers compute post-fixpoints, i.e., over- 
approximations of solutions of systems of equations over abstract values, for 
which the set of equations might be discovered dynamically, while solving the 
equations. New equations can be discovered, for instance, when the control flow 
of a program depends on its data flow. This is the case of higher-order programs, 
as the function that can be called at a given call site can possibly result from 
a computation. We present in §5.2|our abstract interpreter as a function that 
computes in the abstract exception monad, and is defined in open recursive style. 


5.1 The Abstract Exception Monad 


A big-step interpreter for a programming language with exceptions can be defined 
in an elegant manner using the exception monad, which we briefly recall. In the 
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exception monad, a computation is either a success value, or an exception that 
carries some value—typically of type exception—from the object language. 


type m 8 = Success 8 | Exception Y 


return :: 8 —>m8 >= 2: mf, > (bı > m b2) > m b2 
return x = Success x (Success x) >= f = f x 
(Exception e) >= f = Exception e 


In this monad, the raise function expresses the action of throwing an excep- 
tion, while the dispatch function, corresponds to the dispatch construct of our 


prototype language (83), and expresses the action of catching an exception. 


raise :: V > mA dispatch :: m 3; > (bı —>m b2) > (V >m b2) > m b2 
raise e = Exception e dispatch (Success x) f g = f x 
dispatch (Exceptione) fg = ge 


The raise function simply injects its argument into the exception case, whereas 
the dispatch function takes two continuations, to handle, respectively, the success 
case, and the exception case, by performing a case analysis on the monadic value. 

We can easily define a monad that mimics the behaviour of the exception 
monad, with the difference that it deals with abstractions of sets of (possibly 
exceptional) values, instead of mere exceptional values. The construction is based 
on the observation that (m3) is isomorphic to (8) x (VY), that can itself 
be abstracted into p(8) x (A) by using our abstract domain for sets of values. 
Thus, we define the abstract exception monad, written mË 3, as follows: 


type m! 8 = 8 x A 


return’ :: 8 > mË 6 >=! :: mË 61 > (6, > mË Bo) > mË b2 
return’ B = (B, 1) (B,A) >= f = let (B’,A’) = f B in (B',A UA’) 


The return’ operation records its argument as the set of possible values, and 
asserts that no exception is returned: the set of possible exceptions is L. The 
>=! operation retrieves the value part of its monadic argument and passes it to 
the continuation. The final value is composed of the value part that was produced 
by the continuation, and of the union of the exceptions that might have been 
raised by the monadic value and by the evaluation of the continuation. The 
functions return® and >=} satisfy the monad laws if (L,L) is a monoid. 

The fact that mË 8 is a monad does not suffice to use it in an abstract 
interpreter, though. We also need m? 8 to be an abstract domain, i.e., one must 
decide when two monadic values are included in each other, and how to compute 
abstract unions, intersections, and widening. 


Interestingly, the monad mË 8 acts as an abstract domain as soon as ĝ is 
an abstract domain: this is the standard cartesian product of abstract domains, 
where operations are defined pointwise. In practice, we only need to consider the 
instance mt A, i.e., the domain of exceptional abstract values. 
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The remaining pieces that are needed to use mË @ in an abstract interpreter 
are the abstract versions of raise and dispatch. They are defined as follows: 


raise’ : A+ mA dispatch? : m4 8 > (8 > m! A) > (A> m! A) > mt A 
raise’A = (L, A) dispatch’ (B,A) FG=FBUGA 


The raise” operation raises a set of possible exceptions, by recording the abstract 
value for exceptions in the set of possibly returned exceptions, and by returning 
the bottom value, since it can never return any value. It is the dual of return”. 

The dispatch” function executes the value continuation on the set of possible 
values, and executes the exception continuation on the set of possible exceptions, 
and then returns their abstract union in the domain of exceptional values. 

We can easily show that the abstract operations compute over-approximations 
of their counterpart in the exception monad. Assume the type { is equipped 
with a concretisation function yg : 8 —> p(B) for some set B. Then, we define the 
concretisation for the abstract monad: 


Ymi g : MË B —> p(m 
Ymi g(B, A) = {Success b | b € yg(B)} U {Exception v | v € y(A)} 


The concretisation specifies that the first component of monadic values form the 
success values, and that the second component describe possible exceptions. 

The soundness results for the abstract operations show that they compute 
over-approximations of their concrete counterparts: 


Lemma 6. The following inclusions are satisfied: 
— {returnb | b € ya(B)} C Ym! g (return! B) 
— {m >= f |M E ymt p, (M), f E Ya, -smt Bo (F)} E Ymi p (M >=! F) 
— {raisev | v € 7(A)} C Ymi a (raise? A) 
M E Ymi By (M), 
— ¢ dispatchm f g| f E€ Ygi—mt pa (F), ¢ E Ymi Bo (dispatch M F G) 
g E Wwomt Bo (G) 
where YBı— b2 (F) = {f | VX, Vz € YBı (X), fa € "yas (F X)}- 


5.2 A Monadic Abstract Interpreter in Open Recursive Style 


In this section, we describe our whole-program static analyser. It infers an over- 
approximation of the values that a program might compute, and the exceptions 
that it might raise, with the possible values they carry. Although it analyses 
programs that can deal with first-class functions, it is not defined as a control-flow 
analyser [60], but rather as an abstract interpreter that performs a value analysis. 
The insight is the following: since functions are first-class citizens in the language, 
a value analysis also infers an approximation of the control flow. A value analysis 
will indeed compute which functions may be called at every call site. 
Our analyser follows the open recursive style, and has the following type: 


(T >E > mt A) > (T > E > mt A) 
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Assuming eval :: T > E => m” A, we define [-]°' :: T> E> m’ A 
[2]2! =return*E() 


return! L if E(x;) = L for some 1<i<n 
[c(a1,...,an)]€"' =4 return" {constructs :{c-(E(1),..., E@n))}} 
otherwise 


(ne! =return" {ints : {n}} 
[21 op x2] @*!=return' {ints : E(x1).ints [op] E(x2).ints} 
luf. Av. t]"!=return"{funs : {u f. Ax. t => Elif arw.tyth 
[x yje" =if E(y) = L then return! | else 
Ueus. Ax.tE')EE(x).funs eval t E” 
where E” = E', f œ F, x > E(y) 
and F = {funs : { uf. Ax.t => E’}} 
[let « = t in uje” = [t] >=" Av. if v = L then return” else 
[aje 
[match t with pı > tı |---| Pn => tn]? = eje” >} \v.if v = L then return” L else 
Lli<i<n (Pi <’ v) >" OE’ 
if E’ = L then return" else [t:J22, 
[raise x] 2”! = raisetE(x) 
[dispatch u with val e>t|exn y> r]ẹ” = dispatch” [¢]2" 
(Av. if v = L then return? else [u]2“.,) 


E z:v 


(Ae. if e = L then return” else [rE )-) 


Fig. 2. Definition of the abstract interpreter. 


It takes as a parameter an analyser, that represents the information that has 
been discovered so far on the program, and produces an analyser as output, that 
exploits the input analyser to produce more analysis results, that are possibly less 
precise. The role of the fixpoint solver is to find a post-fixpoint of this functional. 
Similar approaches—leveraging fixpoint solvers to define static analysers—have 
been successfully used in other work on static analysis [22]64]50/4). 

Our abstract interpreter is defined in Figure |2| where [t]€! denotes the 
abstract value of type m#A obtained by analysing the program t under the 
abstract environment E, and using the analysis function eval for recursive calls. 
Importantly, the analyser does not call eval for every recursive call. Instead, eval 
is only used when the analyser cannot be called on a strict sub-term. In practice, 
this means that eval is only used to analyse function calls. In every other place, 
we have the guarantee that the analysis is demanded on a strict sub-term, and a 
standard recursive call is performed. This strategy saves time in practice, as it 
lightens the burden of the fixpoint solver, that only needs to find post-fixpoints 
for function calls rather than for every program point. 

To analyse a variable, we return the abstract value found in the environment. 

To analyse a construct, we retrieve the abstract values for every argument, 
and return the corresponding abstract value for that constructor, or L if some of 
the argument was L, because of the eager semantics. 
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The analysis of an integer returns this integer injected in the integer domain. 
The analysis of binary operations on integers retrieves the integer parts of the 
abstract values for the two arguments, and returns the result of the transfer 
function from the integer domain for that binary operation. 

The analysis of a function mimics the concrete semantics: it returns an abstract 
closure composed of the code of the function and its abstract environment. 

The analysis of function calls is more interesting. If the abstract value for 
the argument is L, then we return L, because evaluation is eager. Otherwise, we 
retrieve all the possible closures for the value at the call position, and analyse their 
bodies by extending their environments with the abstract value for the argument, 
and with the abstract closure itself (we are dealing with recursive functions). The 
final result is the union—at the level of the abstract monad—of the analyses 
of all the possible function bodies. Because the bodies of the functions that are 
analysed are not strict sub-terms of the original term xy, we perform an external 
recursive call to the analyser, by using the eval parameter. 

The analysis of let bindings chains the analyses of its two parts, and, because 
evaluation is eager, checks for emptiness before analysing the second sub-term. 

The pattern matching construct is analysed by first analysing the scrutinee, 
and then analysing each branch of the match independently. For each branch, 
we retrieve the environment produced by matching the abstract value with the 
pattern (written p< v), and then we analyse the code of that branch if the 
matching was possible. Then, we take the union—at the level of the abstract 
monad—of the analysis results from each branch. Notably, the exceptions that 
any branch might raise are reported in the final result. The definition for matching 
abstract values against patterns is available in the companion research report [34]. 

Analysing the raise construct is easy: a call to the raise? function suffices. 
Finally, the analysis of dispatch amounts to calling the dispatch! function from 
the abstract monad, on the analysis of the scrutinee, and on two continuations, 
that will analyse the codes of the two branches, if they are given non- arguments. 


5.3 Soundness of the Abstract Interpreter 


We show that the abstract interpreter of Figure Bis sound, in the sense that it 
computes an over-approximation of the behaviour of programs. 


Definition 8 (Behaviour of programs). Let S be a set of evaluation environ- 
ments: EVALs t = Ugeg{Successv | E F t Ivai v} U {Exception e | E F t Pex e} 


The behaviour of a program t as a function EVAL that takes a set of evaluation 
environments as input, and produces a set of values with a tag that indicates 
whether it results from normal or from exceptional evaluation. 

Then, the soundness of the abstract interpreter follows: 


Theorem 1 (Soundness). Assume eval is a post-fizpoint, i.e., [t]@*! E evalt E 
for every t and E. Then, EVAL (e) t C Ym a (fele). 
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Proof. We have to show that for every E € y(E), m € {val,exn} and v € Y, 
if E F tm v, then r € yma([t]&"), where r = Successv when m = val, and 
r = Exception v when m = exn. The proof proceeds by induction on the evaluation 
judgement, generalising over m and E. The only interesting case is the one for 
function application, which exploits the induction hypothesis, the post-fixpoint 
property of eval and the soundness of abstract inclusion E. All other cases result 
from the soundness of the abstract operations and from induction hypotheses. 


The soundness theorem assumes that eval is a post-fixpoint, i.e., [éJ@! C 


evaltE. This property is ensured by the soundness of the fixpoint solver, that 
always returns a post-fixpoint. The function eval is, indeed, the result of the 
fixpoint solver called on the function Aeval. At. AE. [t]ẹ®". 


6 An Abstract Interpreter for OCaml Programs 


Based on the abstract interpreter of we implemented a static analyser for 
OCaml programs (version 4.14), that returns a map from top-level identifiers of 
the program to their abstract values. Our prototype and its test suite (see 
are available as a companion artefact [85]. 

We have implemented several optimisations, that are crucial to obtain decent 
performance. For example, nodes of the analysed AST are indexed by program 
points using unique integers as identifiers. This enables efficient comparison 
of sub-terms and allows using efficient data structures like Patricia trees [53]. 
Moreover—this is of paramount importance for performance—we perform hash- 
consing of abstract values and memoise the operations on these abstract values. 

We present in the next sections some key implementation details that we 
needed to analyse OCaml programs. 


6.1 Refinements With Respect to the Formal Presentation 


The abstract interpreter we implemented follows the structure we have presented 
in but implements three more refinements, that we purposely elided to 
follow the presentation more easily. A thorough presentation of these refinements 
would go beyond the scope of the current paper. 


Context sensitivity. Our analyser is context sensitive: we implemented a form 
of call site sensitivity, that is akin to an abstraction of the call stack. Following [50], 
we retain full sensitivity until the list of call sites becomes mazimal, i.e., when 
a program point appears more than once in that list, which may indicate a 
recursive call to some function. In addition, we always remember the last call 
site. In practice, the list of call sites is an additional parameter to the abstract 
interpreter. Following [50] again, we use this list of call sites to decide when 
widening on the environments should be performed: it is performed only when 
eval is called on a mazimal list of call sites. The same list of call sites is also used 
to derive dynamic exception names and abstract pointers (see q6.4]and 96.5). 
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Flow sensitivity. Our abstract interpreter is able to exploit information that is 
learned when a branch in a match is taken, or when branching on an arithmetic 
test. For example, in the program match (x,y) with (None, _ ) >a | _ >t, 
our analyser is able to refine the possible environments, by taking into account 
that x = None in the first branch, and that this first branch necessarily returns 
the value None. This is done by performing a backward analysis of the scrutinee 
(x,y). This backward analysis infers an over-approximation of the environment, 
knowing that the scrutinee successfully matched against the pattern (None, _ ). 


Dynamic partitioning. Finally, we have employed a form of dynamic partition- 
ing to avoid conflating some analyses results, that could degrade precision. Based 
on a notion of similarity on the shapes of abstract values found in environments, 
we decide whether to conflate contexts or not. The technique is inspired by the 
silhouettes used in shape analysis [39]. 


6.2 Transformation of Typed OCaml ASTs 


The actual language that our interpreter takes as input is more complex than 
the one we presented in but undoubtedly simpler than the OCaml AST. The 
main differences between our intermediate language and the OCaml AST, is that 
we deal with only one construct for pattern matching, and only one construct for 
exception handling, and that those two constructs implement orthogonal features 
in our language. This is in contrast with OCaml’s try t1 with p -> t2 and 
match t with pl -> t1 | exception p2 -> t2, that conflate pattern match- 
ing with exception handling. The transformation into our two constructs is mostly 
straightforward, and greatly simplifies the job of the static analyser. 

Our intermediate language makes the evaluation order explicit using let 
bindings. While the evaluation order in OCaml is generally unspecified, we did 
our best to mimic the choices that the OCaml compiler makes. 

We added specific application nodes for OCaml primitives. To ensure they are 
called with the correct arity, we inserted A-abstractions when they were partially 
applied, or additional application nodes when they were given more arguments 
than expected. We also handled specifically the short-circuiting primitives on 
boolean expressions && and ||, as they change the evaluation order. 

We kept the n-ary application nodes of the OCaml AST (instead of the binary 
applications from $), as this is important for the semantics of labelled /optional 
function arguments. Nevertheless, the transformation from the OCaml AST 
into our intermediate language needed a lot of care and effort. In particular, 
missing labelled arguments required the insertion of A-abstractions, which can 
be particularly subtle when interacting with optional arguments. 


6.3 Pattern Disambiguation 


The last major difference between OCaml and our intermediate language is 
the exhaustive and non-ambiguous requirements on pattern matching. These 
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properties not only simplify the semantics of our intermediate language, but also 
facilitate the analysis of programs. Indeed, each branch of the pattern-matching 
can be analysed independently of the other ones, whereas in OCaml, branches 
must be considered in order, until one pattern matches the inspected value. The 
OCaml type-checker still provides warnings to verify the utility of each branch 
and the ezhaustiveness of the overall pattern matching. 

Enforcing exhaustive and non-ambiguous pattern matchings in OCaml would 
require to use of cumbersome patterns, and, furthermore, it is not always possible 
to write such patterns in OCaml. It is, indeed, allowed to match on values whose 
types may have an infinity of constructors, e.g., arrays, strings, or extensible 
variant types (see §6.4]for details). To reach these requirements, we extend the 
language of patterns with a complement p \ q [8]. A value v matches a pattern 
p \ q if and only if it matches p but not q. In an ordered pattern matching 
match t with pı => ui|--- | pn = Un, we can express that the value v of the 
term t matches the it! pattern, unambiguously. It suffices to add that v does not 
match any of the preceding patterns p; with j < i, i.e., v matches p; \ (Xp;) <v. 

The method presented in [8] shows how to solve the disambiguation prob- 
lem [32]. It relies on the notion of pattern semantics |p] that is the set of values 
matched by a pattern: [p] = {v € V | p ~~ v}. The idea is to reduce any pattern 
p into a purely disjunctive pattern q, i.e., a pattern containing no complements \, 
while preserving its semantics : [p] = [q]. The reduction relies on rewriting rules 
that correspond to algebraic laws of set theory: a constructor c behaves like a 
labelled cartesian product, the disjunction + like set union, and the complement \ 
like set difference. Note that the pattern language proposed in 48] conflates the 
different forms of OCaml constructors (constructor variant, polymorphic variant, 
records, arrays and tuples) as they behave similarly w.r.t. to their semantics. 

In order to fully reduce a pattern, the method also relies on the observation 
that a variable x, of a variant type 7 must be matched by a value whose 
head is a constructor of the type 7. Therefore, the semantics of this variable 
x, can be described as the union of semantics of all constructor instances 
of T: [e+] = Ucec, [¢(41,---;2n)], where C+ is the finite set of constructors of 
co-domain 7. Similarly, the utility approach, implemented in the OCaml 
compiler, relies on the ability to enumerate all the constructors of a type to 
provide a non-ambiguous description of the useful patterns. For types that may 
not be finitely described, the semantic approach can still be used to partially 
reduce the complements [7]. We keep anti-patterns—patterns of the form x \ q 
where q contains no complements—when there exists a value v such that 7\q~v. 

Finally, to guarantee the exhaustiveness of pattern matching, it suffices to add 
a rule z\ (pi +:::+DPn) = raise Match_ failure when necessary. Again, generating 
such a non-ambiguous rule, for data types that may not be finitely described, is 
only possible thanks to pattern complements. 


6.4 Dynamic Exceptions 


The exception type in OCaml is an extensible variant type: it can be dynamically 
extended with new variant constructors. This means that new exception con- 
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t=... | let exception ¢ of 71 *---* Tn int | let exception b=Zint 
vis... | d | d(vi,..., vx) 
E(é)=d 
DyYNAMICCONSTRUCT 


S; E F ¢(a1,...,0r) Iva S;d(2(21),...,#(an)) 
Swi{d};E,é4dt tim S; v 
S; E H let exception € of Ti * +--+ * Tn int im Sev 
S: E, bdt tim Sv E(@)=d 
S; Ek let exception b = Zin t m S’;v 


LETEXCEPTION 


REBINDEXCEPTION 


A ::= {...; names:V} | a | pa.A (Abstract value) 
V ::= {(c,6)} (Abstract names) 


[let exception of 7 *--- x Tn in tjè” = IEZ ¢names=(c,6)} 
[let exception b = € in t] 2" = Hes ee 


Fig. 3. Changes to support dynamic exception naming (excerpts). 


structors are dynamically generated during the execution of programs. Although 
this section focuses on the exception type, the techniques we present apply to 
any extensible variant type as well. 

To model the dynamic behaviour of type extension, we introduce dynamic 
constructors, written ¢, that, unlike static constructors c, are dynamically associ- 
ated to a variant name d during the evaluation. We update the language of 
and its semantics to support these dynamic constructors (Figure B). 

The let exception € of 71 *--- * T, in t construct defines the new exception 
constructor ¢, that is dynamically bound to a fresh variant name in the sub- 
term t. The exception alias construct let exception b = € in t defines the exception 
constructor b, that is bound in the sub-term t to the variant name of €. Constructed 
values can now have a dynamic variant name as their head constructor. 

To account for the generative aspect of dynamic constructors, the evaluation 
rules now carry an execution state S, that contains the set of the already generated 
variant names. These are akin to the time-stamps from the CFA literature [25/44], 
that are used to allocate data in memory locations. In the analysis, we use an over- 
approximation 6 of the list of call sites—that we used already in q6.1] to control 
the widening strategy—to give abstract names (c,d) to dynamic constructors. 

Finally, as the variant name of an exception constructor is resolved dynami- 
cally, the pattern matching relation depends on the evaluation environment FE: 
C(pi,---;Pn) < d(v1,..-, Un) if and only if E(@) = d, and p; < v; for all i € [1, n]. 

As the exception type is extensible, a finite number of constructor patterns 
never forms an exhaustive set of patterns for the exception type. Therefore, the 
utility approach on pattern matching [40] used in OCaml for exhaustiveness check- 
ing cannot provide an exhaustive list of non-ambiguous counter-examples: that 
list is not known statically. In contrast, the disambiguation approach from q6.3]is 
particularly well suited to such types, by leveraging anti-patterns [T]. Moreover, 
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Fig. 4. Changes to support mutable records (excerpts). 


the equality of two exception constructors b and @ of the same arity can only be 
resolved dynamically. Therefore, there is no way to statically prove, or disprove, 
the utility of a pattern b(q1,...,@n) against a pattern ¢(p1,...,Dn). On the other 
hand, in our pattern formalism, we can simply write 6(q1,.--,dn) \ @(p1,---sPn) 
to guarantee the non-ambiguity between the two. 


6.5 Mutable Records and Global State 


OCaml supports mutable records. While immutable records can be modelled in 
the programming language of q3] in the form of constructs—an immutable record 
is a variant with a single case—mutable records require extending the semantics 
with a global memory heap S (Figure |4). 

Heaps are maps from memory locations £ to record blocks. Record blocks are 
structured memory blocks, that contain values for all the registered fields of the 
record. The standard notion of reference can be modelled as a mutable record 
with a single field. This is exactly how the type of references is defined in OCaml. 

We adapt the big-step semantics in a standard way, so that it takes a heap as 
input and returns an updated heap as output. The evaluation rules for record 
creation, access, and update, either query or modify the memory heap as expected. 

OCaml features pattern matching on mutable records. We adapt the rules for 
pattern matching, so that matching on a mutable record first queries the memory 
heap to retrieve the values for the fields of the record, before matching continues. 

To analyse programs that involve mutable records, we add a new field to 
abstract values, that contains the possible abstract locations ¢° a value might be 
equal to. Abstract locations denote sets of concrete locations. Similarly to the 
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dynamic extension constructors of fresh abstract locations are chosen by 
following a naming scheme that is based on the abstract call stack. 

The abstract interpreter is easily adapted to support global state, by lifting 
the abstract exception monad to the state monad, where states are abstract heaps. 
Abstract heaps map abstract locations to abstract record blocks, that themselves 
map record fields to abstract values. The operations on abstract heaps and the 
transfer functions on records are standard, and elided from the presentation. 


6.6 Modules and Functors 


The OCaml language includes an expressive module system [36], that supports 
hierarchical structures, higher-order functors, and first-class modules. In this 
section, we give the reader the main insights for the analysis of OCaml modules. 

First, we consider an untyped semantics of modules, i.e., we do not propagate 
type information. In particular, we do not take type abstraction boundaries into 
account. We carefully to keep track of module coercions, however: signature 
ascriptions may have, indeed, a computational content, as they can remove some 
module fields. Coercions are automatically applied at functor applications to 
“reshape” the functor argument. Coercions distribute on functors, contravariantly 
on their formal arguments, and covariantly on their results. 

Embracing further the untyped nature of our approach, we made the choice of 
having a single class of values, that comprises both values from the core language 
and values for module structures and functor closures. This simplifies both the 
concrete semantics (for example, transfers from the module language to the core 
language and back are no-ops), and the design of the abstract domain. As we 
sketched in the previous sections, it suffices to add new fields to abstract values 
to describe the possible structures and functor closures. 

We represent structures as unordered records, i.e., maps from field names to 
values. Functor closures hold the functor code, an environment, and coercions for 
the argument and the result, that shall be applied when the functor is called. 

Importantly, the support of dynamic exceptions (§6.4) was required to support 
functors, since an exception might be declared in a functor’s body: this leads to 
the creation of a fresh exception every time this functor is instantiated. 

The analysis functions for the core language and the module language, of 
types T — E —> A and M > E > A, are mutually recursive. Still, the approach of 
using a fixpoint solver to define our abstract interpreter remains applicable. The 
two functions can be transformed into a single function of type (T+M) > E > A, 
then given to the solver, and split back into two functions. Our untyped approach 
was again crucial, as we could keep a single type of abstract values, and a single 
type of abstract environments, which made the previous transformation possible. 


7 Experiments 


We tested our prototype analyser for OCaml programs on 290 programs, that 
range from small, manually written programs, to larger examples extracted 
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Table 1. Experiments: size of the programs, analysis time (with minimisation disabled, 
and enabled). They are sorted by program decreasing size. 


Program Size Analysis Analysis 
(LoC) (w/o minim.) (w/ minim.) 
heintze_mcallester_1000 4002 0.2 s 0.2 s 
boyer 1292 26 m 57 m 
kb 552 1.2 s 1.4 s 
map_merge 152 4.58 5.6 s 
sliding_window 122 44s 5.8 s 
skolemize 82 38 m 2.9 s 
negative_normal_form 64 40 m 46s 
red_black_trees2 64 0.5 s 1.0s 
church 20 0.05 s 0.07 s 
sieve 19 0.01 s 0.01 s 
tak_cps 8 0.04 s 0.04 s 
tak 4 < 0.01s 0.01 s 
mc91_cps 4 < 0.01s < 0.01s 
mc91 2 < 0.01s < 0.01s 


from the literature or from the OCaml compiler’s test suite. The test programs 
include some classic functions such as the factorial program from Takeuchi’s 
function, McCarthy’s 91 function, fixpoint combinators, programs that compute 
over church numerals, transformations of abstract syntax trees for arithmetic 
expressions or logical formulas, and the algorithm for Knuth-Bendix completion of 
rewriting systems. The test suite covers a large array of coding styles, e.g., direct 
style, continuation-passing style, monadic style, or imperative style, and exhibits 
different language features, e.g., assertions, exception-based control flow, GADTs 
and non-regular types, polymorphic recursion, second-order polymorphism, etc. 

We present in Table [1] a selection of the test results on some key examples. 
The complete test results are reproducible via the companion artefact [35]. The 
experimental results are encouraging, both in terms of performance and precision. 

In terms of precision, our analyser infers the best achievable abstract values 
on several programs: For McCarthy’s 91 function mc91, the result is shown to be 
greater than 91; for the skolemisation of logical formulas skolemize, the analyser 
correctly infers the form of returned terms, i.e., they cannot contain existential 
quantifiers. For other programs, the analyser only infers an over-approximation: 
for the red_black_tree program, it correctly infers the general shape of trees, 
but cannot infer the structural invariant that no red node has red children. 

The map_merge example calls the Map.Make functor of finite maps from the 
standard library, builds several maps, and calls the merge function on those maps, 
that merges the maps. The merge function has the following signature: 


val merge: (key -> 'a option -> 'b option -> 'c option) -> 
‘a M.t -> 'b M.t -> 'c M.t 
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Its first argument specifies what should be done when a key/value pair is found 
in one of the maps, or in both. This argument is never called for keys that are 
absent in both maps, i.e., the case where the second and third arguments are both 
equal to None is unreachable. OCaml programmers often write assert false 
in the corresponding pattern matching branch. The analyser infers that the 
Assertion_failure exception is never raised, which means that this branch 
cannot be reached. The analyser cannot show, however, that every assertion 
present in Map. Make is satisfied: in the re-balancing function for pseudo-balanced 
trees, assertion failures are reported, because the analyser cannot infer that the 
heights that are recorded in the trees are strictly positive. 

In terms of performance, most examples, and even some large programs, are 
analysed in a couple of seconds, or in less than a second. In contrast, some 
examples like boyer need approximately one hour for the analysis to terminate. 
boyer is a tautology checker, that is run on a large formula (its definition takes 
about 1000 lines). This formula, of mutable type, requires the creation of several 
hundreds of abstract pointers, which makes abstract operations on abstract heaps 
very costly. If we reduce context sensitivity to “the last call site’, fewer abstract 
pointers are created, and the analysis completes in 31 s. This suggests that context 
sensitivity choices for naming abstract pointers need further investigation. 

Our experiments show that the minimisation of abstract values during widen- 
ing and unions ($4.2) may impact performance positively or negatively. For 
instance, for AST transformations like skolemize and negative_normal_form, 
minimisation decreases the analysis time from about 45 m down to a few seconds. 
For boyer, however, minimisation incurs a heavy cost, as it doubles the analysis 
time. Further investigations are needed to reduce the cost of minimisation. 


8 Related Work 


The static detection of uncaught exceptions for ML programs has been the topic 
of many related work. We only discuss a selection of them, and some results on 
static analysis of functional programs that are also relevant to the current work. 


Set Constraints. Several static analyses for functional programs were based set 
constraints [2I]. The principle is to transform a program into a constraint, that 
features unions, intersections, negations, and a form of conditional constraint. 
Then, the constraint is simplified and given to a solver, from which the analysis 
result is obtained. Fahndrich and his coauthors built a exception analysis tool 
that infers types and effects for SML programs using the BANE constraint 
analysis engine, using a mix of set constraints and type constraints. 


Type and Effect Systems. Pessaux and Leroy have developed ocamlexc 
[38]54]55], a tool that detects uncaught exceptions in OCaml programs. They 
use a type and effect system to analyse programs modularly. Their analyser 
extends unification-based type inference, and makes use of row variables 
and polymorphism to produce precise types for functions. They type variants 
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structurally using equi-recursive types. Recursion may also occur through the 
effect annotations on arrow types. They also describe an algorithm to improve the 
accuracy of their analysis, that uses polymorphic recursion for row variables. The 
programming language Koka [33] also leverages row variables to type algebraic 
effects. Recently, de Vilhena and Pottier [62] devised a type system based on row 
variables for a language that supports the dynamic creation of algebraic effects. 


Control-Flow Analyses. An important family of analyses for higher-order 
programs are control-flow analyses (CFA) [60655145019]. The goal of CFA is to 
determine which functions might be called at a call site, and on which arguments. 
CFA can be expressed as instances of abstract interpretation [4644147150]. CFA 
can easily be extended to analyse exceptions. Yi developed an abstract interpreter 
that detects uncaught exceptions in SML [68]67]66]. It implements an analysis 
that is close to a 0-CFA analysis extended to support exceptions. 


Abstract Domains in CFA. Most previous work on CFA share a common 

representation for abstract values: Although they need to represent some induc- 

tively defined sets, they refrain from using a native device to express fixpoints, 

such as our u constructor. Instead, cyclic definitions are encoded using indirec- 

tions through abstract pointers, that point to an abstract heap. For example, the 

inductive set of continuations from Plis expressed as follows in CFA domains: 
{funs : {(Ax.x) > {}; (Ax. k (x x i)) > {i> pi; k > pk}}} 


where: h(pi) = {ints : [1, +00] } 


h(pr) = {(Ax. x) > {}; (Ax. k (i) {i => pi; k > pr} } 


In this abstract value, the closures’ environments contain the pointers p; and Pk, 
that are defined in the abstract heap h. This abstract heap contains a cycle, since 
pr is used in the definition of the abstract value pointed by pg. This is in contrast 
to our approach, where we make use of nodes to introduce cycles directly, 
without referring to a heap. We only use the abstract heap for mutable data. In 
CFA domains, all data (constructs, closures, etc.) are “abstractly allocated” in 
the abstract heap, regardless of whether they are mutable or not. 

A benefit of the approach with heap indirections is that abstract values have 
a bounded height, and cycles need no special treatment: The equality of abstract 
pointers is used to compute on abstract values. While this makes the operations 
of CFA abstract domains easy to define, using pointer names limits drastically the 
detection of semantically equivalent values. We argue that our approach allows to 
detect more semantics inclusions, therefore decreasing the number of iterations 
of the analysers, at the cost of more complex abstract domain operations. 


Tree Grammars. Several analyses for functional languages have been defined 
using tree grammars. For example, Reynolds [58] defined an analysis for pure first- 
order LISP using data sets, i.e., tree grammars that denote the possible outputs of 
function symbols. Extended tree grammars, i.e., grammars with selectors of the 
form X — Y-.hd, have been used by Jones and his coauthors to analyse full LISP 
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[28], and, later, strict and lazy A-calculi [26/27]. From a \-term, they produce tree 
grammars with selectors, that denote the possible inputs and outputs of function 
symbols. Selectors can then be eliminated in order to simplify the grammars. 
Deterministic tree grammars have been identified as an abstract domain to recast 
analyses based on set constraints into the abstract interpretation framework [I0]. 


Tree Automata. Generalising string automata, tree automata are an established 
formalism to represent sets of trees. They have been used to define static analysers 
for term-rewriting systems (TRSs) [3] and higher-order programs [20]. They have 
been extended to lattice tree automata to support arbitrary non-relational abstract 
domains at their leaves [I7[18], and improve the performance of analysers for 
TRSs. Recently, tree automata were combined with relational numeric abstract 
domains [29], to express relations between scalar data contained in trees. Recent 
work report on the design of relational domains for algebraic data types [BIGI]. 


Cyclic Abstract Domains. Type graphs [22] are a form of deterministic tree 
grammars, that are represented as cyclic graphs with no sharing, i.e., trees with 
cycles. They have been used to analyse Prolog programs. We used a similar graph- 
based representation as an intermediate form to compute union, intersection and 
widening. We use, however, a term-based representation with binders as our main 
representation, as it allows easy and efficient hash-consing and memoisation [I3]. 
Our widening operator (§4.2) is inspired by the one from type graphs. 

Mauborgne [42]43]41] studied graph-based abstract domains for sets of trees, 
and defined ways to have minimal, canonical representations of such abstract 
values. Using Mauborgne’s structures natively could improve our analyser’s 
performance, as we could avoid translating back and forth from terms to graphs. 

Finally, recursive types were a strong inspiration for the abstract domain 
of Recursive types have been thoroughly studied in the context of subtyp- 
ing [L6BIH], where polynomial algorithms have been devised to decide inclusion. 
They proceed by translating types into variants of tree automata, that can also 
deal with the contravariance of arrow types. 


Fixpoint Solvers. To the best of our knowledge, Le Charlier and Hentenryck [6] 
were the first to exploit a dynamic fixpoint solvers to define static analysers. 
They used the top-down solver to analyse Prolog programs. The same approach 
has been followed for the Goblint static analyser for C programs [64]59], and for 
the analysis of WebAssembly programs [4]. Recent work introduced combinators 
to define dynamic fixpoint solvers in a modular manner [80]. Several dynamic 
fixpoint solvers have been successfully formally verified [24163]. 


9 Conclusive Remarks and Future Work 


We have introduced a A-calculus that features pattern matching primitives and 
exception handling, in which exceptions are first-class citizens. We have presented 
a static analysis for this language, in the form of a monadic abstract interpreter, 
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that can be used as an effective static analyser. This analyser detects uncaught 
exceptions, and provides a description of the values that a program may return. 
The abstract interpreter relies on a generic abstract domain, that is parameterised 
over a domain for scalars, and that can represent regular sets of values of our 
programming language. This is achieved by a fixpoint constructor in the syntax 
of abstract values, that denotes an inductive set of values. 


The abstract interpreter is defined in an open recursive style, where the 
recursive knot is tied by calling a dynamic fixpoint solver. Importantly, the 
analyser does not call the solver for every recursive call: it performs standard 
recursive calls on strict sub-terms, but calls the solver to analyse function calls. 


Based on this approach, we implemented a static analyser for OCaml programs. 
We presented some extensions of our formalism to support several core features of 
OCaml, including dynamic generation of exceptions, mutable records, the module 
system. Our analyser starts with transforming the OCaml typed AST into a 
simpler language where evaluation order is explicit. This transformation required 
a lot of care and demanded a substantial implementation effort. One key aspect 
of this transformation is the disambiguation of pattern matching, as we chose to 
work with an exhaustive and non-ambiguous pattern matching primitive in order 
to simplify the analysis of programs. 


Our experiments on 290 OCaml programs show some encouraging results, both 
in terms of performance and precision. Still, some improvements are needed for 
the analysis to be applicable to larger code bases. In particular, the minimisation 
of abstract values requires some more study and fine tuning: while it plays a 
crucial role to analyse some examples in a reasonable time, it can also severely 
undermine the analyser’s performance in some other cases. 


At the moment, the analyser can deal with whole programs only. To analyse 
libraries more modularly, we plan to experiment with generating abstract values 
that over-approximate the inputs of a library’s function, based on their types. In 
the near future, we also plan to extend the analyser with OCaml features that 
are yet to be supported (e.g., arrays, laziness, floats, objects, recursive modules, 
interactions with the operating system, etc.), most of which will require substantial 
formalisation and implementation efforts. Recently introduced features, such as 
algebraic effects and one-shot continuations, are also on our agenda, and are 
likely to raise interesting challenges. 


Finally, we hope that our abstract interpreter can be extended to perform 
other kinds of static analyses for OCaml programs, such as a purity analysis, 
or the detection of whether the behaviour of a program might depend on the 
order of evaluation. We would also like our implementation to serve as a basis 
for experimenting with recent relational domains for trees and scalars [29]612], 
and with relational analyses of functional programs [49]. 


Data-Availability Statement. The companion artefact is hosted on the 


Zenodo platform and referenced by the DOI)10.5281/zenodo. 10457925 
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Abstract. Legal expert systems routinely rely on date computations to 
determine the eligibility of a citizen to social benefits or whether an ap- 
plication has been filed on time. Unfortunately, date arithmetic exhibits 
many corner cases, which are handled differently from one library to the 
other, making faithfully transcribing the law into code error-prone, and 
possibly leading to heavy financial and legal consequences for users. 

In this work, we aim to provide a solid foundation for date arithmetic 
working on days, months and years. We first present a novel, formal se- 
mantics for date computations, and formally establish several semantic 
properties through a mechanization in the F* proof assistant. Building 
upon this semantics, we then propose a static analysis by abstract inter- 
pretation to automatically detect ambiguities in date computations. We 
finally integrate our approach in the Catala language, a recent domain- 
specific language for formalizing computational law, and use it to analyze 
the Catala implementation of the French housing benefits, leading to the 
discovery of several date-related ambiguities. 


Keywords: Verification, Semantics, Abstract Interpretation 


1 Introduction 


From filesystems to web servers, time representations are pervasive in modern 
computer systems. While several libraries and standards were proposed through- 
out the years, current well-established approaches such as Unix time [53] used in 
the standard C library or Windows’ FILETIME [36] represent dates and time as 
a number of seconds or nanoseconds that have elapsed since an arbitrary date. 

This approach is sufficient for many usecases, in particular when dates are 
only used for logging purposes, or for determining the chronology of two events. 
However, it does not permit more complex arithmetic, for instance the addition 
of months or years, that span a variable number of days. For these usecases, 
mainstream programming languages offer different libraries that adopt different 
conventions. For example, Python’s datetime module [46] forbids the addition 
of months, while Java’s java.time library [43] silently rounds invalid dates onto 
the largest pre-existing date, hiding ambiguous computations from programmers. 
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Given the variety of libraries and behaviors across languages, programming 
with date arithmetic is thus highly error-prone, and developers’ assumptions 
about how dates behave might vary from project to project. When developing 
systems whose correctness is critical and that heavily depend on date compu- 
tations, such as expert legal systems that rule our social and financial lives, 
this issue becomes highly concerning. As an example, consider the following ex- 
cerpt from Section 121 of the US Internal Revenue Code [25], which defines the 
“Exclusion of gain from sale of principal residence”. 


In the case of a sale or exchange of property by an unmarried individual whose 
spouse is deceased on the date of such sale, paragraph (1) shall be applied by 
substituting “$500,000” for “$250,000” if such sale occurs not later than 2 years 
after the date of death of such spouse and the requirements of paragraph (2)(A) 
were met immediately before such date of death. 


This paragraph differentiates between two cases, depending on whether a 
sale occurred not later than 2 years after a given date. While applying this para- 
graph is straightforward in most real-world cases, corner cases raise interesting 
questions. In particular, when considering leap years, what should be the result 
of adding two years to February 29th? When manually computing taxes, lawyers 
would be able to detect the ambiguity, and to reach a decision based on legal 
precedents. If handled automatically by a computer however, the computation 
may be done incorrectly; computing February 29 2004 + 2 years in Java using 
java.time would return February 28 2006, while performing the same compu- 
tation using the date utility from Coreutils returns March 1 2006. 

Similar computations are pervasive in expert legal systems; the correspond- 
ing regulations rely on them to determine whether a citizen is eligible to social 
benefits or a resident for tax purposes. Errors in such systems can have dra- 
matic consequences; case in point, the incorrect implementation of Louvois, the 
former French military payroll system, led to several families either receiving 
over-payments that they had to reimburse years later, or incomplete paychecks 
totaling a few cents [42]. For such critical software, it is therefore paramount to 
provide clear semantics for date computations to avoid mistakes based on erro- 
neous assumptions about a library’s behavior. Additionally, such a semantics can 
form the basis for further analyses, paving the way for the automated detection 
of date-related ambiguities as part of the development process. 

Unfortunately, while elegant in theory, a universal semantics for dates and 
date arithmetic would not be usable in practice; when possible ambiguities are 
identified in law texts, legislators oftentimes extend or modify the law itself to 
avoid them. For instance, article 641 of the French civil procedure code [30] spec- 
ifies that, when adding a positive duration to a date to compute a deadline, the 
rounding, if needed, should go down. Such articles often have narrow application 
scopes; similar articles in other branches of the law might either leave rounding 
unspecified, or adopt a different convention. In the US, date computations when 
filing motions are heavily specified, however the complexity and amount of cor- 
ner cases led to no less than 27 subsequent notes and amendments to provide 
clarifications [14]. Other regulations instead attempt to escape ambiguities due 
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to month or year additions by reducing such computations to a nonambiguous 
number of days. Such regulations heavily vary depending on the country and the 
branch of law considered: acts from the Council of European Communities con- 
sider that a month should be treated as 30 days [15], while the Indian Supreme 
Court took the opposite approach, enacting that the duration of a month for cus- 
toms purposes is variable [4]. To enable their adoption in a variety of contexts, 
date libraries therefore require their semantics to be configurable by developers. 

The lowest granularity of date arithmetic we focus on is the day level. Our 
literature review and communications with lawyers in different countries have 
indeed shown that this kind of date arithmetic is sufficient for the kind of tax 
and social benefits computations that are the core application target of Catala. 

In this paper, we aim to provide a sound foundation for critical software 
relying on date computations, through the following contributions: 


Formally Capturing Date Computations. We first present a formal seman- 
tics of date computations (Sec. 2). Our formalization relies on a base semantics, 
which is universal and does not specify a rounding mode but instead provides 
facilities to round on-demand. We leverage these facilities to derive a rounding- 
specific semantics for different rounding policies. We mechanize this semantics 
in the F* proof assistant, and prove several theorems establishing necessary con- 
ditions for, e.g., the monotonicity or associativity of computations (Sec. 3). As 
part of this mechanization, we also identify seemingly intuitive properties that 
do not hold in practice, and exhibit counter-examples. 


Automatically Detecting Date Ambiguities. Building on the semantics, 
we define a notion of rounding-insensitivity, which captures that the result of 
evaluating a program’s expression does not depend on the chosen rounding policy 
(Sec. 4). Aiming to automatically identify possibly harmful ambiguities, we then 
propose a new static analysis based on abstract interpretation [16] targeting 
this 2-safety hyperproperty. We implement our analysis in the Mopsa static 
analyzer [28, 29]. We show that with relational numerical abstract domains, 
our analysis enables precise reasoning. In addition, our implementation provides 
actionable counter-example hints which will help users understand why a given 
expression is rounding-sensitive. 


Contribution to Date Arithmetic Libraries. To enable the adoption of 
this work in existing projects, we implement an OCaml library abiding by our 
formal semantics, which exposes common rounding modes, as well as an option 
to abort when ambiguous computations are detected. Our library is standalone 
and open-source, and easily integrable in OCaml developments. We also survey 
the behavior of mainstream date arithmetic libraries (Sec. 6), and provide litmus 
tests that can be used to easily understand how a library behaves with respect 
to date rounding. 


Case Study: Integration in the Catala Language. To demonstrate the 
applicability of our approach in real-world programs, we replace previous han- 
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date unit du=yl|mid 

rounding mode r :=f¢ | 4 |L 

values v= (y,m,d) | L 
expressions e =v |e +s n|rnd, e 
period p == (Nd, Nm, Ny) 


Fig. 1. Date expressions 


dling of dates in the Catala language [34], a recent domain-specific language for 
formalizing computational law, by our library. We also extend the Mopsa [28, 29] 
static analyzer to support a subset of the Catala language, enabling us to analyze 
Catala programs for rounding-insensitivity. We evaluate our approach against an 
existing Catala implementation of the French housing benefits, and automati- 
cally identify several date-related ambiguities in the Catala model. This work is 
in the process of being upstreamed in the Catala compiler. 


2 Formalizing Date Arithmetic 


We start this section by presenting a base semantics for date computations, which 
does not explicitly specify a rounding policy to handle ambiguous dates. Dates 
expressions are presented in Fig. 1. Dates values are represented in the year- 
month-day format of the standard Gregorian calendar, where each component 
will be represented as an integer. We also include a L element, which represents 
an error case. Date expressions consist of either date values, or of the application 
of one of the date operators. Date expressions also contain variables, however 
their treatment is straightforward and orthogonal to this work; we omit them as 
well as their associated environment in our presentation. Operators are of two 
kinds: the addition +5 of n years, months, or days, where n is an integer, and the 
rounding rnd, of a date. Our semantics supports three types of rounding: rnd} 
rounds up the current date to the nearest valid date; rnd, rounds down, and 
rnd, raises an error if the current date is invalid. A period is a triple of relative 
integers, respectively representing the numbers of days, months and years. 

We now define a formal semantics for evaluating expressions. We start by de- 
scribing the semantics of date addition, presented in Fig. 2. To match standard 
date formats, we start counting at 1 for valid days and months; to simplify the 
presentation, we will often represent months using their name instead of their 
number (e.g., Jan instead of 1). Our semantics is designed to preserve the follow- 
ing invariant: assuming the date on the left is initially valid, any non-ambiguous 
computation will return a valid date. When the computation is ambiguous, the 
resulting date is between the largest smaller and the smallest larger valid date. 

Our semantics is defined recursively. Consider for instance the addition of 
a number of days n. If n is small enough to remain in the same month and 
year, we are in the terminal case and the rule ADD-DAyS applies. The first 
premise of the rule ensures that the date is initially valid. It relies on an auxiliary 
function nb_ days, omitted for brevity, which computes the number of days for a 
month in a given year (e.g., 31 for January, and 28 or 29 for February depending 
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ADD-YEAR ADD-MONTH-UNDER 
m+n<1 
(y, m, d) +y n> (y+n,m, d) (y,m, d) +m n> (y = 1,m, d) +m (n+ 12) 
ADD-MONTH ADD-MONTH-OVER 
l<mt+n< 12 m+tn> 12 
(y, m, d) +m n > (y, m + n, d) (y, m, d) +m n > (y +1,m, d) +m (n — 12) 


ADD-DAYS-OVER 
1<d<nb_days(y,m) d+n->nb_days(y,m) 


(y,m,d) +an > ((y,m,1) +m 1) +a (n — (nb_ days(y,m) — d) — 1) 


ADD-COMP ADD-DAYS-UNDER1 
e> e 1<d<nb_days(y, m) d+n<0 
e+sn >e +sn (y,m,d) +an > (y, m, 1) +a (d—-1+n) 
ADD-DAYS-ERR1 ADD-DAYS-UNDER2 
d<1 n+1<0 (y, m,1)+m (-1) > (y m, d’) 
(y,m,d) +an —> L (y, m, 1) +a n > (y',m', 1) +a (n+ nb_days(y’,m’)) 
ADD-DAYS-ERR2 ADD-DAYS 
d>nb_days(y, m) 1<d<nb_days(y,m) 1<d+n<nb_days(y,m) 
(y,m,d) tan —> L (y,m,d) tan > (y,m,d+n) 


Fig. 2. Semantics for date addition 


on the year). Otherwise, we either add a month (rule ADD-DAyYs-OVER) or 
remove a month (rule ADD-DAys-UNDER2) and perform a new addition with 
an updated number of days. When the initial date is invalid, we return L to 
avoid propagating large errors and maintain important properties about date 
semantics that we prove in Sec. 3. When composing additions, it might therefore 
be necessary to apply rounding operators presented later in this section to avoid 
L. One last point of interest in these semantics is the dissymmetry between 
the ADD-DaAys-OVER and ADD-DaAys-UNDER-* rules. Since adding a number 
of days is never ambiguous, we wish to ensure that, assuming the initial date is 
valid, we never apply the ADD-DaAys-ERR1 or ADD-DAyYS-ERR2 rules. To do so, 
when updating the month or year during day addition, we always go through an 
intermediate state corresponding to the first day of the month, which is always a 
valid day independently of the month and year. For brevity, we also omit several 
redundant error cases, where the current month does not belong to the interval 
[1; 12]; these cases return L. Following standard notations, we will denote the 
transitive closure of our small-step semantics as Š. 

The last step is now to define semantics for rounding, shown in Fig. 3. Com- 
pared to additions, the rounding semantics is simpler: if the date is already 
valid, any mode of rounding leaves the date unchanged (ROUND-Noop). Other- 
wise, rounding down (ROUND-Down) returns the last day of the current month, 
rounding up (ROUND-UP) returns the first day of the next month, while the 
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ROUND-ERR1 ROUND-ERR2 RouNnbD-DOwNn 
d<1 d >nb_days(y, m) d > nb_days(y,m) 
rnd,(y,m,d) > L rnd, (y,m,d) > L rnd, (y, m, d) > (y,m,nb_ days(y,m)) 
Rounb-Noop Rounpb-Up 
1 <d< nb_days(y,m) d>nb_days(y,m) (y,m,d) +m 1 > (y’,m’,d’) 
rnd, (y, m, d) > (y, m, d) rnd+(y,m, d) + (y’,m’, 1) 


Fig. 3. Semantics for date rounding 


strict rounding mode (ROUND-ERR2) raises an error. In all cases, if the day is 
initially negative, rounding returns L; we will prove in Sec. 3 that this never 
happens when starting from a valid date. 

Separating additions and rounding has several benefits. Different use cases 
might require different rounding modes, and different ways of adding days, 
months, and years. For instance, when adding a period such as 1 year and 10 
months, some settings might specify that months should be added first, or that 
rounding must be performed after adding months, and again after adding years; 
our formal semantics enables this flexibility. 

The last remaining step is to define additions not just for individual days, 
months, or years, but for composite time periods. Building upon our semantics, 
we can define this generically for a rounding mode r as follows, and avoid the 
need for users to manually call rounding operators. 


e +r (y, m, d) :=rnd,(((e+y Y) +m M)) +a d 


One point of interest in our derived forms is that we only apply rounding 
after performing addition of years and months. Indeed, adding a year should be 
equivalent to adding 12 months. However, if we performed rounding after each 
operation, adding 1 year and 1 month to February 29 2020 with the rounding-up 
mode would return April 1, 2021 instead of Mar 29, 2021. We emphasize that, 
in cases where this behavior would be expected, defining derived forms corre- 
sponding to this semantics would be straightforward using our base semantics. 

Based on this semantics, we can now formally define the notion of an am- 
biguous date expression in Definition 1. 


Definition 1 (Ambiguous expression). A date expression e is ambiguous 
if and only if rndı (e) Š L. 


Note that this intensional definition of ambiguity is equivalent to stating that 
the an expression e is ambiguous if and only if rounding e in different modes 
yields different dates. 

While the semantics presented in this section focuses on the core, possibly 
ambiguous computations, our work also includes other non-ambiguous operators 
(omitted for brevity), e.g., to retrieve the first or last day of a given month. This 
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allows to encode a variety of patterns, for instance, the second-to-last day of a 
month by combining date arithmetic with the “last day of month” operator, or 
to rely on a preprocessing phase if months must be treated as 30 days [15]. Our 
semantics supports reasoning on computations mixing rounding modes. 


3 Mechanizing Semantics 


Building upon the semantics presented in the previous section, we now present 
several properties of interest related to date computations that we will rely 
upon when designing a static analysis in Sec. 4. As part of our contributions, we 
mechanize our semantics, related properties and their proofs inside the F* proof 
assistant [52]. 


3.1 Semantic properties 


As part of our proof development, we separate semantic properties in two cate- 
gories: properties established on the base semantics, valid for all derived forms, 
and properties derived on specific rounding modes. In many cases, proofs on 
derived forms can be performed efficiently by composing lemmas on base se- 
mantics, thus simplifying the proof effort. During development, we also encode 
our OCaml implementation of date computations and corresponding theorems 
into qcheck [54], a QuickCheck [13] inspired property-based testing framework 
for OCaml. We mostly used QuickCheck as a fast sanity check before spending 
time proving lemmas in F*. In particular, our initial intuition for several of the 
lemmas and theorems presented was often unreliable, omitting corner cases; we 
used QuickCheck to gain more confidence in our intuition before moving to F*. 
This encoding allowed us to automatically find most of the counter-examples 
presented in Sec. 3.2. 

We start by proving that expressions in our semantics always evaluate to a 
value (possibly L), i.e., reduction is never stuck and it terminates. 


Theorem 1 (Normalization). For any date d, and any integer n, there exists 
a value vs such that d+5n Š vs. 


In addition to normalization, a useful property about our semantics is a char- 
acterization of valid computations: when using any of the non-abort rounding 
modes, an addition starting from a valid date will always return a valid date; 
the definition of validity is straightforward, but omitted for brevity. To prove it, 
we need the following properties on base semantics, which we prove by induction 
on the reductions. 


Lemma 1 (Well-formedness of day addition). For any valid date d, any 
integer n, and any value v, d+gn>usvF L. 


Lemma 2 (Well-formedness of year/month addition). For any valid date 


d, any integer n, any value v, and 6 € {y,m}, we have d+ 5 n vsv Æ 
LAday_of(v) > 1. 
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Lemma 3 (Well-formedness of rounding). For any date d such thatd 4 L, 
any value v, and r € {t, 1}, we have rnd, d Š v > valid(v). 


We can now state the following theorem on the derived semantics. 


Theorem 2 (Well-formedness). For any valid date d, any period p, any value 
v, andr € {},f}, we have d+, p Š v => valid(v). 


We now present several theorems related to the monotonicity of the addi- 
tion in our semantics. Date comparison is defined in the standard way, as the 
lexicographical order on (y,m,d). To simplify the presentation, we lift the com- 
parison operators to operate on date expressions, defined as the comparison on 
the values obtained by evaluating the expressions. 


Theorem 3 (Monotonicity). For any dates d,,d2, for any period p, forr € 
1 t} if dy < d2, then dı +r p < d2 +r P. 


A point of interest in this theorem is the discrepancy between bounds: while 
the bound in the premise is strict, the bound in the conclusion is loose. Un- 
fortunately, a stronger version with strict bounds on both sides does not hold; 
for instance, two additions involving rounding down of April 30 and April 31 
respectively yield the same result. To prove this theorem, we again need several 
intermediate lemmas operating on base semantics. First, we establish an equiva- 
lence between adding years and adding months. We then state and prove several 
monotonicity properties on the base semantics. The proof of Theorem 3 follows 
by direct application of these lemmas. 


Lemma 4 (Equivalence of year and month addition). For all date d, for 
all integer n, d +y n = d +m (12 * n). 

Lemma 5 (Monotonicity of year and month addition). For all dates 
dı, d2, for any integer n, for 6 € {y, m}, dy < dg > dy +5 n < dg +6 N. 
Lemma 6 (Monotonicity of day addition). For all valid dates dı, d2, for 
any integer n, dy < d2 > dı +an < d2 +a n. 

Lemma 7 (Monotonicity of rounding). For all dates dı, d2, for r € {4,f}, 
dı < d2 = rnd,(d1) < rnd, (d2). 


Finally, we state the following lemma, which guarantees that rounding down 
will always return a smaller date than rounding up. Additionally, when the 
addition is not ambiguous, the two rounding modes return the same result. 


Theorem 4 (Rounding). 


1. For all date d, for all period p, d+, p < d +4 p. 
2. For all date d, for all period p, d+, p £ L > d+, p= d+4p= d+ p. 


We finally characterize the ambiguity of month addition, a property that we 
will need to prove the soundness of the static analysis presented in Sec. 4. 
Theorem 5 (Characterization of ambiguous month additions). For all 


valid date d, for all integer n, for all value v such that d +m n Š v, we have 
nb_days(year_of(v),month_of(v)) < day_of(v) & rndı (v) Š L. 
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3.2 Non-properties and counter-examples 


We now present several seemingly intuitive and ideally useful properties about 
date semantics that do not hold in practice. 


Non-Property 1 (Commutativity of addition) For all date d, for all peri- 
ods pı, p2, for all r € {Th we have (d +r pı) +r p2 = (d +r p2) +r pı 


Consider the case where d = March 31, pı = 1 day, and pa = 1 month. When 
adding pı first and rounding down, the addition returns April 30, while the result 
when adding pə first will be May 1. Similar examples exist when rounding up, 
for instance, by setting d = January 29 2023 , pı = 30 days, and pp = 1 month. 


Non-Property 2 (Associativity of addition) For all date d, for all periods 
Pı; p2, forr € {}, f}, we have (d +, pi) +r p2 = d +r (pi + p2) 


Consider the case where d = March 31, pı = 1 month, and pz = 1 month. In 
all rounding modes, adding pı followed by pə will require rounding, ultimately 
yielding May 30 or June 1, while directly adding pı + p2 returns May 31. 

As the addition being associative and commutative is common among most 
datatypes, we emphasize that its invalidity for dates can be a source of confusion 
for programmers; common optimizations or rewritings of date computations in a 
seemingly equivalent way (e.g., replacing 1 month + 1 month by 2 months) can 
lead to different outcomes. However, these disparities are exclusively due to oc- 
currences of rounding in computations. We thus aim to help programmers when 
handling date computations by proposing a static analysis that automatically 
detects when rounding might impact the evaluation of expressions. 


4 A Static Analysis For Rounding-Insensitivity 


In this section, we leverage our formal semantics to define a sound static analy- 
sis automatically verifying date computations programs. Our goal is to statically 
detect ambiguous computations, whose result depends on the chosen rounding 
mode. Indeed, when writing programs whose specification is the law, choosing 
the rounding mode arbitrarily is not a possibility; this would amount to a le- 
gal interpretation that exposes the administration operating the program to be 
challenged in court if the rounding mode is unfavorable to a user. The cost of 
bearing the responsibility for making technical regulatory choices for adminis- 
tration personnel has been documented by Torny [55]. 

A naive approach would be to flag any program which contains an ambiguous 
addition. However, this solution can be overly restrictive: computations can be 
ambiguous while having no impact on the final outcome of the program. Con- 
sider for example the expression d + 1 month <= March 15 2023. If no rounding 
happens when adding d and 1 month, then the expression is obviously safe. Oth- 
erwise, we notice that the rounding may only happen to yield the last day of 
a month, or the next day of the upcoming month. In both cases, comparing 
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date current = random_date(); 

date birthday = random_date(); 

date intermediate = birthday + [2 years, 0 months, 0 days]; 
date limit = first_day_of (intermediate); 
assert (sync (current < limit)); 


aR Ww NR 


Fig. 4. Example extracted from Catala code modeling the French housing benefits 


this result with a date in the middle of a month is thus safe. Instead, we con- 
sider a more interesting property called rounding-insensitivity, capturing that 
the evaluation of an expression is the same for both rounding modes. 

At a high-level, our analysis works by tracking constraints over the day, 
month, and year of a date, through the YMD domain (Sec. 4.1). The YMD 
domain is fully parametric in a numerical abstract domain, and works by trans- 
lating date constraints into numerical constraints. We discuss the choice of nu- 
merical abstract domains in Sec. 4.2, in order to obtain the best precision in the 
presence of linear constraints and unconstrained dates. We analyze the compu- 
tations with both rounding modes and compare the result to decide rounding- 
insensitivity, which is a 2-safety hyperproperty. We explain how we lift the YMD 
domain to these double computations in Sec. 4.3. We implemented our analysis 
within the Mopsa static analysis platform [28, 29], described in Sec. 4.4. We have 
taken special care in ensuring that actionable counter-examples can be generated 
in Sec. 4.5, paving the way for use by non-experts. 

We think that abstract interpretation hits a sweet spot to perform this analy- 
sis. Its full automation makes it usable by non-specialists, especially with the pro- 
vided counter-example hints. It allows to derive tailored approximations thanks 
to Th. 5. The current definition of date addition is recursive and there are non- 
linear arithmetic constraints involved, which does not work well with SMT. 

We use as a motivating example the program shown in Fig. 4. This program 
has been extracted from a Catala code snippet used to formalize the French 
housing benefits [33, Sec. 3.1]. We will provide more details on Catala and the 
extraction to date programs in Sec. 5. In this program, we pick two arbitrary, 
unconstrained dates, perform a date-duration addition of two years, and project 
the resulting date onto the first day of its month. The assertion at line 5 expresses 
the rounding-insensitivity of the comparison between an arbitrary, unconstrained 
date and the computed date. The sync predicate, formally defined in Sec. 4.3, 
holds if and only if the evaluation of its expression in both rounding modes yields 
the same result, meaning that the expression is rounding-insensitive. 

The programs we consider in this section are written in a standard, toy 
imperative language. 


4.1 The YMD domain combinator 


The YMD domain translates constraints on the year, month and day of a date 
into numerical constraints over three integer variables. These numerical con- 
straints are handled by a numerical abstract domain, described in Definition 2. 


3 Here sync(current < limit) could be reduced to sync (limit). However our anal- 
ysis will not need it, and will be able to provide counter-example hints also targeting 
the values of current, improving readability of the output. 
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(V > Z) > P(V) 
p > {v| year(v), month(v), day(v) € dom(p)} 


N# > P(V > D) 
vp { 


dates_ dom: { 


nw Unery(n#) {e| dom(e) = dates_dom(p) A Yv € dom(e), e(v) = (y, m, d) 
Avalid(y,m, d) A y = p(year(v)) Am = p(month(v)) A d = p(day(v))} 


Fig. 5. Concretization of the YMD domain 


The YMD domain can be seen as a domain combinator, or a functor relying on a 
numerical abstract domain — we will discuss the chosen instantiation in Sec. 4.2. 
This domain works at a fixed rounding mode. 


Definition 2 (Numerical abstract domain). In the following, a numerical 

abstract domain is a lattice N# on which the following operations are defined: 

— The assignment, assign, between a variable and an expression in a given 
abstract environment yields another abstract environment. 

— The boolean filtering of a state, assume, filters an abstract environment to 
enforce that a boolean expression holds. 

This domain is further defined by a concretization function yy : NË > P(V > 

Z) mapping numerical abstract environments to a set of concrete integer envi- 

ronments it represents. We assume the numerical abstract domain is sound. 


Given a date variable v, the YMD domain will create new auxiliary (or ghost) 
variables year(v), month(v), day(v), which do not exist in the original program 
but simplify reasoning. This is an approach we borrow from the deductive veri- 
fication community, and that has been used in static analyses both in the work 
of Chevalier and Feret [12] as well as in Mopsa. 

We provide a formal definition of the concretization, which defines the mean- 
ing of the YMD domain, and illustrate it on an example. 


Definition 3 (Concretization of the YMD domain). The concretization of 
the YMD domain is formally defined in Fig. 5. It explains how an abstract nu- 
merical environment n# € NË can be interpreted into a set of date environments 
e€V—-D mapping variables to dates. To construct these date environments, 
we first pick an integer environment p E V + Z from the concretization of the 
numerical abstract domain yn (n*). The date environments will have as domain 
definition the date domain of function p, dates_dom(p), which is the set of vari- 
ables where auxiliary year, month and day variables are defined in p. For each 
of those variables v € dates_dom(p), e(v) corresponds to the date defined by the 
auxiliary variables in p, provided that the date is valid. 


Example 1 (Concretization). Let us assume our numerical domain is a map from 
variables to intervals, and consists of the following state: n* = day(d) € [1,31] A 
month(d) € [1,12] A year(d) = 2023. In that case, the concretization is the set 
of date environments e defined on variable d such that e(d) can be any valid 
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date of 2023. Thus, there is a date environment e € yy,yp(n*) such that e(d) = 
(2023, 1,31). However, there is no date environment such that e(d) = (2023, 2, 29) 
and e € yy\qp(n*) because the date is invalid (2023 is not a leap year). 


The YMD domain handles the following transfer functions: 

— Accessors to the day, month or year number of a date. Given a date encoded as 
a variable v, these functions return the associated variable day(v), month(v), 
year(v) respectively. 

— Projection of a date on the first day of the month: given a date encoded as a 
variable v, this function creates a new date having the same auxiliary month 
and year variables. The day auxiliary variable is set to 1. A similar operator 
working on the last day of the month can be defined. 

— The main part of the YMD domain is the transfer function handling month 
addition and potential rounding originating from this addition. We define it 
below, argue it is sound, and illustrate it on an example (in Sec. 4.2). As we 
have proved in Lemma 4, additions on years and months can be reduced to 
additions on months. Our current, potentially ambiguous, real-world examples 
taken from legislative code do not use day addition; as it is never ambiguous, 
we thus do not currently implement it. Given its similarity to month addition, 
we do not foresee any technical difficulty doing so. 

— The YMD domain also provides a transfer function to compare two dates. It 
is induced by the lexicographic definition of concrete date comparisons and 
partitions the results to improve the precision. 


Transfer function for month addition. We provide a simplified OCaml im- 
plementation for the month addition transfer function in Fig. 6. The transfer 
function takes as parameter a date, represented as a variable; a concrete num- 
ber of months; an input abstract state; and a rounding mode chosen for date 
computations. It will return a case disjunction* of type cases: a list of case, 
each consisting in an expression and an abstract state. We start by defining 
day, month, year, which are expressions representing the day, month and year 
number of date through auxiliary variables. The resulting month and year are 
computed through non-linear expressions. Similarly to the semantics, we encode 
months as integers to perform arithmetic operations, and start our numbering 
at 1 for January. The transfer function performs a case disjunction to detect if 
date rounding will happen, following the characterization of ambiguous month 
additions (Th. 5). This case disjunction checks whether the day of the date is 
compatible with the number of days in the resulting month (and year, as Febru- 
ary has one more day during leap years). This disjunction is encoded thanks 
to the switch utility, which takes as input an abstract state and a list of tuple 
of expressions and continuations. Given a tuple (cond, k), the input abstract 
state is filtered to satisfy the expression cond (by delegation to the numerical 


4 These disjunctions can be seen as a partitioning of the abstract state. In this section 
we consider everything is partitioned to improve the precision. Our implementation 
supports limiting the number of partitions. 
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abstract domain). The resulting abstract state is fed to the continuation, which 
yields a case. The cases we encounter during the addition are: 

Rounding to 29 Feb. of a leap year. If the resulting month is February of 
a leap year, and the current day number is greater than 29, we will have to 
perform date rounding. We do so using the auxiliary round function. Depending 
on the rounding mode, it either chooses the provided date, or the first of the 
month afterwards. This date is then returned in its corresponding abstract state 
using mk_date, whose implementation is not detailed. 

Rounding to 28 Feb. of a non-leap year. Similar case omitted for brevity. 
Rounding to a 30-day month. If the current day number is 31 but the re- 
sulting month has 30 days (i.e, it is either April, June, September or November), 
we also have to perform a rounding, either to the 30th of the resulting month, 
or the 1st of the month after. 

Other cases. No rounding happens, the day number remains the same. 

Note that add_months, round and is_leap define syntactic expressions, 

which will be delegated through assign and assume to the numerical abstract 
domain. The expressions at lines 6, 13, 14, 21-22, 26, 28, 30 are not directly 
evaluated: they will be interpreted by the assume of the numerical abstract do- 
main during the evaluation of the switch function. The definition of the transfer 
function for month addition assumes the number of months to add is known as 
a concrete integer. This is not restrictive in practice: all programs we extracted 
from Catala in Sec. 5 only perform date-month addition with a concrete number 
of months. 
The proof of soundness of the abstract month addition, is not formalized in F* 
and omitted for brevity. However, it is a direct application of the characterization 
of ambiguous month additions established in Th. 5, and proved formally in F*. 
The analysis may refine constraints on a day, month or year auxiliary variable. 
These constraints could then entail new constraints on other auxiliary variables 
of the same date to represent only valid dates. This propagation phase is per- 
formed by the strengthening operator described below, which is sound as it only 
removes invalid dates, which are not taken into account by the concretization. 


Strengthening operator. The strengthening operator enforces the following: 

— If the month is February, the day is less than 30. 

— If the month is April, June, September of November, the day is less than 31. 

— If the date is February 29, we know the current year is a leap year. We enforce 
that the year number is divisible by 4, which is a necessary condition. 


Comparison transfer function. The transfer function for date comparisons 
is dates_1t in Fig. 6; it encodes a lexicographic comparison. 
4.2 Instantiating YMD with a combination of numerical domains 


The YMD domain is fully generic in the numerical abstract domain it relies on 
to translate date constraints into constraints over integers. We describe how we 
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1 type case = expr * state 

2 type cases = case list 

3 

4 let switch abs = List.map (fun (cond : expr, k : state -> case) -> k (assume cond abs) ) 
5 

6 let is_leap (y : expr) : expr = (y % 4 = 0 && y % 100 <> 0) II (y % 400 = 0) 

cd 

8 let round (r : rounding) (d m y : expr) (abs : state) : case = 

9 match r with 

10 | RoundDown -> 

11 mk_date d m y abs 

12 | RoundUp -> 

13 let succ_m = i + res_month % 12 in 

14 let succ_y = y + res_month / 12 in 

15 mk_date 1 succ_m succ_y abs 

16 

17 let add_months (r : rounding) (date : var) (nb_m : int) (abs : state) : cases = 
18 let day = day_of date in 

19 let month = month_of date in 

20 let year = year_of date in 

21 let res_month = 1 + (month - i + nb_m) % 12 in 

22 let res_year = year + (month - 1 + nb_m) / 12 in 

23 switch abs 

24 [ 

25 (* Rounding to 29 Feb. of a leap year *) 

26 day > 29 && res_month = Feb && is_leap res_year, round r 29 res_month res_year; 
27 (* Rounding to 28 Feb. of a non-leap year *) 

28 day > 28 && res_month = Feb && not (is_leap res_year), round r 28 res_month res_year; 
29 (* Rounding to a 30-day month *) 

30 day > 30 && is_one_of res_month [Apr ; Jun; Sep; Nov] , round r 30 res_month res_year; 
31 (* No rounding *) 

32 mk_true, mk_date day res_month res_year 

33 ] 

34 

35 let dates_lt (d1 d2 : var) (abs : state) : cases = 

36 switch abs 

37 [ 

38 (year_of d1) < (year_of d2), mk_true; 

39 (year_of d1) > (year_of d2), mk_false; 

40 (year_of d1) = (year_of d2) && (month_of d1 < month_of d2), mk_true; 

41 (year_of d1) = (year_of d2) && (month_of d1 > month_of d2), mk_false; 

42 (year_of d1) = (year_of d2) && (month_of d1 = month_of d2) 

43 && (day_of di < day_of d2), mk_true; 

44 (year_of d1) = (year_of d2) && (month_of d1 = month_of d2) 

45 && (day_of d1 >= day_of d2), mk_false; 

46 ] 


Fig. 6. Abstract transfer functions for month addition and date comparison 


chose a combination of numerical abstract domains to get the best precision 
possible in the presence of non-linearity and unconstrained dates. 


We initially started using intervals and congruences for our first tests. Due to 
its convexity, the interval domain was unable to precisely represent months where 
the day number may be rounded to 30 days during the date-month addition (line 
30 of Fig. 6). Thus, we added a domain of powerset of integers (of size at most 
4) to be precise enough for this usecase. When month is not a constant, the 
congruence domain will be unable to precisely represent the resulting month 
(line 21 of Fig. 6), and refine the potential values of month given constraints 
on res_month. This situation happens often in our evaluation; it is shown in 
our motivating example. We resolved this precision issue by switching from the 
congruence domain to the relational, linear congruence domain [5]. We also added 
the polyhedra domain [17] to keep track of equalities between different day, 
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month and year variables, which happens during analyses on programs with 
unconstrained dates, as we will show in the upcoming examples. 

Our current numerical abstract domain is a reduced product between grids, 
polyhedra, intervals, and a bounded powerset of integers. The relational domains 
rely on the Apron library [27]. The approximation of non-linear computations is 
performed through linearization techniques [37]. 


Example 2. Let us consider the program below picking an arbitrary, uncon- 
strained date d and then adding one month to d. We illustrate the different 
cases of the transfer function add_months in this case, assuming we round down. 


date d = random_date(); date d2 = d + [0 years, 1 months, 0 days]; 


Rounding to 29 Feb. of a leap year. In the first case of the transfer func- 
tion, the numerical domain is able to deduce from the expression day > 29 && 
res_month = Feb that the day of d is either 30 or 31, and the month is January. 
In the rounding down mode, d2 is thus February 29th. The relational domain 
additionally expresses that year(d) = year(d2). 

Rounding to 28 Feb. of a non-leap year. Similar case, omitted for brevity. 

Rounding to a 30-day month. The numerical abstract domain infers that d 

represents the 31st of March, May, August or October, tracked thanks to the 

bounded set of integers domain. As we round down, we deduce that the day of 
d2 is 30, and month(d) € {Apr, Jun, Sep, Nov}. In that case, the relational domain 
can also infer that year(d) = year(d2), as m / 12 will always be zero. 

Other cases. In the last case, the intervals and powerset domains cannot ex- 

press interesting constraints on d and d2. The relational domains are however 

able to capture key relations: 

— The day does not change as there is no rounding: day(d) = day(d2). 

— Thanks to the grids domain [5] we can infer linear relations modulo a constant, 
and thus that the month of d2 is the month after d, even if the year changes: 
month(d2) = 12 month(d) +1, where =12 denotes congruence modulo 12. Note 
that since month(d2) is not a constant, the non-relational congruence domain 
is not sufficient to express this relation. 

— The year number may be the same, or the successor provided that the month 
of d is December. We lose a bit of precision, as the last month always creates 
a year increase in the concrete. 

12 year(d) + month(d) < 12 year(d2) + 11 A 12 year(d2) < 12 year(d) + month(d) + 1 


Example 8 (Addition and strengthening). We use our running example from 
Fig. 4, and show what the date addition and the strengthening operator yield 
for dates birthday and intermediate. In this example, we assume the dates are 
rounded up. As we add two years to birthday, two of the four cases described 
in the month addition previously presented will not apply; we omit them below. 
Rounding to 28 Feb. of a non-leap year. In that case, birthday is a Feb. 
29th, and intermediate rounds up to March Ist. We additionally know that 
year(birthday) + 2 = year(intermediate). The strengthening ensures that 
year(birthday) is divisible by 4. 
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No rounding. The day and month numbers of birthday and intermediate 
are equal. The year condition is similar to the one provided in Ex. 2. 


Example 4 (Comparison). Let us continue with our running example, assuming 
we are focusing on the partition where intermediate has been rounded up to 
March 1st (as shown in Ex. 3). In that case, limit is equal to intermediate. 
Assuming the comparison current < limit holds, we have three different cases, 
described by the line number in Fig. 6. Line 38 yields year(current) < year(limit). 
Line 40 enforces year(current) = year(limit), month(current) < month(limit), 
so month(current) € {Jan, Feb}. Line 42 yields that the year and month num- 
bers of current and limit are the same and day(current) < day(limit). This 
last case is impossible given that 1 < day(current) < 31 and day(limit) = 1. 


4.3 Lifting to both rounding modes 


The YMD domain operates at a given, fixed rounding mode. In this section, 
we leverage the YMD domain to perform date computations in both rounding 
modes and thus prove rounding-insensitivity. This lifting is inspired by Delmas 
et al. [21], who analyze product programs to prove endianness portability of C 
programs. Here, we keep the product of programs implicit: only the rounding 
mode changes between the two executions we will consider. 

We start by explaining how the concrete semantics are lifted from a single 
rounding mode to both. We assume we have a semantics of expressions (respec- 
tively statements) E, [expr] (resp. S,[stmt]) parameterized by a date rounding 
mode r € {t, |}. They take as input sets of environments (€ = V — Val) map- 
ping variables to values (which are either integers or dates), and produce values 
(resp. environments). 


ùr [expr] : P(E) > P (Val) S, [stmt] : P(E) > P(E) 


We define in Fig. 7 the concrete semantics evaluating expressions and state- 
ments over both rounding modes, written respectively Eșļexpr] and Sy[stmt]. 
We do not delve into the details of product programs, which are defined in 
the work of Delmas et al. [21]. In this semantics, the state is now duplicated: 
D = E x E. We ensure that random operations return the same value in both 
rounding modes, to avoid spurious desynchronizations. The sync predicate re- 
turns true if and only if the expression evaluates to the same values in both 
rounding modes, capturing the rounding-insensitivity of the contained expres- 
sion. We use it in the programs we analyze to target the expressions we want to 
check, as we have already seen in Fig. 4. The evaluation of other expressions is 
performed pointwise on both rounding modes, and similarly for the assignments. 


Definition 4. An expression e is rounding-insensitive in a state d if and only 
if Es[sync(e)]({d}) = {(true, true)}. This property is encoded in programs by 
the statement assert (sync(e)). 
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Es[expr] : P(D) + P(Val’) 
E[random_date()](D) = {(d, d) | d € Z?, valid(d)} 
Eq[sync(e)J(D) = (J {(bu == ba, bu == ba) | (bu, ba) = Egle] (Pr, P4)} 


(pp Py )ED 
Eglexpr](D)= (J {(r, 21) |or = Enlelor, v, = Exleloy)} 
(e+, Py )ED 
S}[stmt] : P(D) > P(D) 
Su =eJ(D)= (LJ {Se = vor Sy fe = vile), (vr, v) € Egle] {or, er} 
(pp py )ED 


Fig. 7. Concrete semantics over double evaluation of rounding modes 


The abstract semantics mimics the concrete behavior, but works on a sin- 
gle abstract state instead of a set of concrete double states. The double state 
is represented by duplicating variables according to their rounding mode in the 
numerical abstract domain. A variable x is thus written +x (resp. |x) to represent 
the variable when the upper (resp. lower) rounding mode is used. This duplica- 
tion is performed in a shallow fashion to improve usability: when performing an 
assignment x = e, if e evaluates into the same value in both rounding modes, 
the variable x will not be duplicated into the numerical abstract domain. 


Example 5 (Rounding-sensitivity of the comparison). Back to our running ex- 
ample, we have shown so far how the YMD domain analyzes the program when 
rounding up (Ex. 4). Continuing with the same relational abstract domain, we 
show part of the abstract state in the partition focusing on rounding to Feb. 28 
of a non-leap year in Eq. (1). In the rounding mode down, intermediate rounds 
to Feb. 28, and thus limit rounds down to Feb. 1st. 


day(current) € [1,31], month(current) € [1, 12], year(current) € [—co, +00] 

day(birthday) = 29, month(birthday) = Feb, year(birthday) =4 0 

{day (intermediate) = 1, tmonth(intermediate) = Mar 

{day (intermediate) = 28, {month(intermediate) = Feb (1) 

{year(intermediate) =tyear(intermediate) = year(birthday) + 2 

tday(limit) = 1, fmonth(limit) = Mar {day(limit) = 1, |month(limit) = Feb 

{year(limit) =tyear(limit) = year(birthday) + 2 

We exhibit an abstract state where we cannot prove that the expression 

current < limit is rounding-insensitive. The static analysis will consider all 
cases in the comparison and the evaluation in both rounding modes. For the sake 
of presentation here, we only highlight one case. The date comparison operator 
between current and the rounded up version of limit yields a partition where 


the years are the same and the month number is less. This partition refines the 
abstract state above with the following constraints: 


year(current) =fyear(limit) ^ tmonth(limit) < month(current) = Mar (2) 
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Let us now consider the case where the comparison with the rounded down 
version of limit does not hold, when the years and months are the same but 
the days are not. We get the following additional constraints: 


year(current) =|year(limit) A month(current) =|month(limit) = Feb ^ 
day(current) >|day(limit) = 1 


Combining the constraints from Eqs. (2) and (3) on the abstract state from 
Eq. (1) gives the following result on current: 


year(current) = year(birthday) + 2 A month(current) = Feb (4) 


To summarize, our analysis has been unable to prove the rounding-insensitivity 
of the expression current < limit, in particular in the case of the abstract state 
presented in Eq. (1), refined with constraints from Eqs. (2) and (3). Thanks to 
partitioning and relational abstract domains, we know that the proof fails when 
birthday is a Feb. 29th (of a year y which is divisible by 4, a sound but not 
complete way to express it is leap). In that case, intermediate will either be 
Feb. 28th or March 1st of y + 2. This entails that limit will either be Feb. 1st 
or March 1st of y + 2. In the cases where current is a day of February of y + 2 
(Eq. (4)), the comparison will effectively be rounding-sensitive. 

The original program did not contain any constraints on birthday or current. 
Note that if we add in the program that the day of birthday is less than 28, our 
analysis is able to automatically prove the program to be rounding-insensitive. 


4.4 Implementation 


We implemented our approach in the Mopsa static analysis platform [28, 29]. 
Mopsa is able to analyze C, Python and multilanguage Python/C programs [40, 
41, 44], to prove the absence of runtime errors, and to perform portability anal- 
ysis of C programs [21]. We modified the front-end of a toy imperative language 
also available in Mopsa to analyze programs performing date arithmetic. We 
chose to extend this language for our analysis as we do not require advanced 
features from C nor Python. Thanks to Mopsa’s modular architecture, we have 
been able to reuse iterators for intraprocedural analysis with little code changes. 


The configuration used by Mopsa for 
ED © D O a our analysis is illustrated in Fig. 8. 


The “D.bidates” domain corresponds 

O; O) to the abstract domain and transfer 
w e OD functions described in Sec. 4.3. The 
@ temene: oea “U.ymd” domain is the YMD domain 
Quinn Oe (Sec. 4.1). The last part enclosed in a 
gray box corresponds to the numerical 


abstract domain on top of which the 
YMD domain was built (Sec. 4.2). 


Fig. 8. Date analysis configuration 
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5: assert(sync(current < limit)); 


Desynchronization detected: (current < limit). Hints: 

ftmonth(limit) = 3, tday(limit) = 1, {month(limit) = 2, |day(limit) = 1, 
tmonth(intermediate) = 3, tday(intermediate) = 1, 

{month(intermediate) = 2, |day(intermediate) = 28, 

month(birthday) = 2, day(birthday) = 29, month(current) = 2, day(current) = [1,29], 
year(birthday) =[4] 0, year(current) = tyear(intermediate) = fyear (limit) 

= {|year(intermediate) = |year(limit) = year(birthday) + 2 


Fig. 9. Mopsa’s output on the running example 
4.5 Generating counter-example hints 


We have extended our implementation to provide counter-examples hints when 
a synchronization assertion cannot be proved safe. Given our usecase, it is 
paramount to provide meaningful feedback to users translating law articles into 
Catala code so they understand why their date computations might be ambigu- 
ous (Sec. 5). These hints are precise constraints on the considered program that 
may lead an expression to be rounding-sensitive. They are especially helpful to 
provide more precise date ranges for unconstrained dates that may affect round- 
ing sensitivity. As our approach is incomplete, these hints may be spurious; we 
however did not encounter this issue in our case study on Catala programs. 

This generation of counter-example hints is atypical for static analyses by 
abstract interpretation. This approach is permitted here by a simplified setting 
(variables are assigned once, and the abstract state is partitioned to ensure a 
high precision) and the use of powerful relational abstract domains. In a general 
setting with multiple variable assignments, joins and widenings, most approaches 
need to perform backward analyses [1, 38, 49]. 

This generation of hints works in two steps: it first starts by heuristically se- 
lecting the best partition of the abstract state. The YMD domain may partition 
the abstract state in order to keep the best precision. Our heuristic selects the 
partition with the highest number of desynchronized variables (meaning there 
has been significant roundings), and the highest number of auxiliary variables 
for days and months which are constants. The second step of the hint generation 
extracts the relevant constraints from the considered abstract state. This ex- 
traction starts by collecting all date variables defined in the program. For these 
variables, we evaluate the auxiliary day, month and year variables into intervals, 
and keep only intervals providing meaningful information (i.e., intervals strictly 
included in [1,31] for day variables, strictly included in [1,12] for month vari- 
ables, and bounded intervals for year variables). We then project the relational 
abstract domain onto the set of auxiliary variables where no meaningful inter- 
vals has been extracted to provide linear relations for those. We show in Fig. 9 
the exact, unedited output of the hints generated by Mopsa in the case of our 
running example and highlight their readability. They correspond exactly to the 
constraints previously described in Ex. 5. 


5 Case Study: Application to Catala 


This section highlights how the results and methods established in the previous 
section can be applied in the setting of legal expert systems, and more specifically 
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within Catala [34], a recent domain-specific programming language designed to 
be understandable by lawyers and close to the structure of legal texts, with 
formal semantics that clearly define its behavior to reduce discrepancies between 
legal texts and their implementation. 

We start by describing rulings and implementations of the law where precise 
and well-defined date arithmetic is paramount to ensure expected results. Then, 
we describe how Catala’s implementation of date rounding has recently evolved: 
from the issues we noticed in Catala’s previous off-the-shelf implementation, to 
the port to our date calculation library and the introduction of a function-local 
rounding definition when legal references or interpretations are known, reducing 
the number of cases where the rounding mode is unspecified. We finish by ex- 
plaining the latest implemented feature, which allows the Catala compiler to ex- 
tract date computations and relies on Mopsa to (dis)prove rounding-insensitivity. 


5.1 Date arithmetic and the law 


Critical software relying on date computations is commonly used by companies 
or government agencies to automatically enforce legal dispositions, e.g., to check 
if an application has been filed within the correct time period, to compute age- 
related conditions, or to aggregate periods between dates and compare the result 
to a fixed duration for eligibility calculation. 

In all these cases, there can be heavy financial and legal consequences when 
a date computation goes wrong or is subject to diverging interpretation. In the 
case Bowles v. Russell, 551 U.S. 205 (2007) cited by Bailey [7], the court gave 
Bowles a 17-days notice to file an appeal but this notice was incorrectly computed 
from Rule 4(a)(6) and paragraph 2107(c), as it should have been 14 days. When 
Bowles filed his appeal on the 17* day, the court system dismissed the appeal 
on the basis that Bowles should have filed on the 14'® day and not trust the 
notice the court gave him earlier. In more mundane cases, an incorrect date 
computation can deprive someone of their social benefits, or impose a higher 
late fee than what should be. 

These doubts about date computation in software applying the law are all 
the more concerning that previous research in code open-sourced by French gov- 
ernment agencies did not show a great deal of transparency or trustful practices 
on that particular matter. For instance, the custom programming language M, 
used by the French tax authority to compute income tax [35], encodes dates 
as mere floating-point numbers where the date is just a decimal number in the 
format DDMMYYYY. The French unemployment agency, whose IT system is 
mostly implemented in Java, uses a custom date library for its computation 
(fr.unedic.util.temps.Damj) but its implementation is omitted from their 
only open-source release [47]. 


5.2 Catala’s policy about date rounding 


Recently, the Catala project [24, 34] has aimed to bring more accountability and 
transparency to programs computing taxes or social benefits inside government 
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agencies. The Catala language is specifically designed to allow the easy trans- 
lation of computational law into code; in particular, it is based on prioritized 
default logic [10], which enables programmers to closely follow the base case/ex- 
ception pattern that permeates the law. To increase confidence and explainability 
in its programs, Catala also comes with a formal semantics which is formalized in 
the F* proof assistant. These formal semantics mostly focus on Catala’s default 
calculus, the encoding of prioritized default logic as a programming language, 
and do not specify all Catala expressions, including date computations. 


Initially, the semantics of the date computations was defined by the behavior 
of the calendar OCaml library [50] used inside its interpreter. However, this 
library relies on the POSIX behavior which is not always monotonic and may 
appear quirky (for instance, it computes Jan 31st + 1 month as March 3rd for 
non-leap years) despite its very complete set of features. These unusual behaviors 
prompted a deeper investigation about the corner cases of date computations 
and led to the implementation of the library presented in this paper. While 
now integrated in the Catala interpreter, our library is standalone, and freely 
available with an open-source license. As the Catala compiler is implemented in 
OCaml, so is our library®, currently packaged with opam; however, by relying 
on our semantics, its implementation is straightforward. We do not foresee any 
difficulty porting it to other languages, and plan to do so to support more of the 
Catala backends, including Python and JavaScript. 


The default behavior of our date computation library inside the Catala in- 
terpreter is to raise a runtime exception whenever a date rounding is needed 
during a computation. This choice of behavior has been made conservatively be- 
cause the decision to round up or down date computations in software enforcing 
legal rules is itself a legal rule that has to be specified, as we described in the 
introduction of this paper. To avoid runtime exceptions, rounding rules can be 
specified at the scope level (a precise definition of Catala’s scopes is outside the 
range of this paper, but it can be considered as a sort of function in Catala) and 
should be justified, for example by a legal reference or interpretation. 


We applied this methodology to fix the code of the biggest Catala program so 
far, which computes the French housing benefits [32]. Articles L822-4, R823-4 of 
the construction and housing Code, as well as article L512-3 of the social security 
Code, all feature a comparison of the age of the user to an age constant. However, 
as the input to the Catala program is not the age of the user but their birth date, 
we know such a comparison can be ambiguous if the user was born on February 
29t! on a leap year and if the current date is March 1%. In those situations, we 
took the decision to round up the date addition, as shown in Fig. 10, with the 
date rounding increasing mention. We are currently trying to contact the 
relevant government agencies operating the system for clarifications about how 
this issue should be handled. 


5 Our F* formalization can be extracted to executable but non-idiomatic OCaml code. 
In practice, we thus manually reimplement our library in OCaml to use features such 
as named arguments or exceptions to provide a more idiomatic API. 
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declaration scope CheckingAgeInferiorEqual : 
input birth_date content date 
input current_date content date 
input target_age content duration # always a number of years 
output age_is_inferior_or_equal_target content boolean 


scope CheckingAgeInferiorEqual: 
definition age_is_inferior_or_equal_target equals 
birth_date + target_age <= current_date 
date rounding increasing 


COMNAAARwWNHH 


ja 


Fig. 10. Catala code for checking the age of the user is lower than a constant 


iv} 
fil tala >| Slicing +> date-sensitive Pro en progs.u Mopsa 
encara ng 8 computations 5. gen. rosu) P 
A+ Hints 


Fig. 11. Catala date ambiguity analysis pipeline 


To best benefit the recipient and be in line with the general principle under- 
pinning legal interpretations of social security law in France, a better solution 
would be to perform the computation twice, by rounding up and down, and 
select the outcome most favorable to the user in case of disagreement. The flex- 
ibility offered by our library allows us to do that, and we intend to explore this 
avenue in future work. Being able to control precisely where the rounding is 
done and how is key for developers and maintainers of such programs, as they 
are responsible for the legal effect of the program itself [22]. 


5.3 Detecting potentially ambiguous computations 


Choosing the rounding mode for each date computation allows us to precisely 
control the outcome of ambiguous computations. However, given the pervasive- 
ness of such computations in legal texts, it is also extremely tedious, and figuring 
out the cases where an ambiguous computation could happen is complex. For 
these reasons, we expect some developers to delay this step and wait for inci- 
dents to figure out the policy of the institution operating the program on the 
matter. But figuring out this policy might itself be tricky because of the automa- 
tion frontier [33] strictly separating the developers from the decision-makers in 
charge of legal policy decisions. 

To help developers reach out to the legal services of their institution with 
concrete examples of where things can go wrong before production incidents, we 
integrated the semantics and abstract domains presented in this paper inside 
the ongoing initiative to provide a proof platform for Catala programs [20]. By 
connecting the Catala compiler to the Mopsa static analyzer, we are able to check 
whether a date computation can be ambiguous in the context of the program, 
and often exhibit a counter-example if it is the case. We present in Fig. 11 our 
analysis pipeline. It consists of three main phases: program slicing, verification 
condition crafting, and analysis — which may generate counter-examples. 

First, we scan the Catala program in one of its intermediate representation 
and look for Catala expressions susceptible of raising a runtime exception be- 
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cause of an ambiguous date computation. We use classic techniques of program 
slicing for this step, selecting only the target sub-expression and then adding the 
definitions of variables used in that sub-expression recursively to extract a small, 
self-contained program with sufficient information to be analyzed. This will sim- 
plify the counter-example hint generation of Mopsa, which outputs constraints 
on variables rather than subexpressions of a computation. 

Second, we augment the sliced program with the assertions and other infor- 
mation about its variables that are declared in the original Catala program to 
further constrain the search space. So far, our analysis is intraprocedural, but we 
are planning to implement an inlining pass to make it inter-procedural. We then 
translate the sliced program to Mopsa’s toy language (using the .u extension), 
which can then be fed to the static analyzer. 

Finally, we run Mopsa on the generated program. As we have mentioned in 
Sec. 4.5, Mopsa is able to exhibit potential counter-examples hints. While these 
hints are approximate due to incompleteness of the analysis, they are often suffi- 
cient to yield real, actionable counter-examples on the Catala programs that we 
analyzed. We extract relevant intervals and linear constraints and display them 
to the user, in the format illustrated by Fig. 9. While the intervals and constraints 
presented are descriptive, and sufficient for a programmer to identify concrete 
counter-examples, they can however be difficult to grasp for non-experts. For- 
matting these constraints in a more readable format is an interesting question, 
requiring further interaction with lawyers; we leave it as future work. 

The implementation of housing benefits in Catala currently consists of about 
20,000 lines (including the text of the law directly specifying it) that were writ- 
ten prior to this work. While automatically analyzing this implementation using 
our verification pipeline, we found issues in two date computations (one of them 
being our running example). In both cases, Mopsa was able to provide action- 
able counter-example hints. Several other computations were age computations, 
which are now handled by a custom scope with a legally interpreted date round- 
ing mode, as shown in Fig. 10. Finally, remaining computations rely on durations 
defined outside of the analyzed scope, which requires an inter-scope analysis in 
Catala, which is being implemented. In the meantime, we performed a man- 
ual duration extraction in these cases and detected 16 new unsafe (rounding- 
sensitive) date comparisons, which are real issues. In all cases, the provided 
counter-example hints are actionable. In 10 cases, the issues can only happen 
with a current date before 2023. By constraining the year to be greater or equal 
to 2023, these 10 cases are proved safe. All date arithmetic programs we have 
currently extracted or written are small and analyzed within three seconds. 

As the number of Catala programs grows, we hope to apply our analyzer at 
a larger scale, possibly suggesting future avenues for improvement. 


6 Related Work 


We start by surveying the behavior of mainstream implementations of date arith- 
metic. We created a suite of litmus tests involving date-duration additions, and 
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the expected result depending on the rounding mode. We wrote test drivers for 
each library, running those tests to decide which rounding mode applies. 


The java. time library [43] provides a LocalDate class for dates and a Period 
class to express durations. In our tests, the addition is performed by rounding 
down. This behavior is explicitly described in the documentation [26]. To the 
best of our knowledge, there is no option to use another rounding mode, or fail 
during ambiguous computations. In the Python standard library, the datetime 
module [46] provides a date class and timedelta to express durations. How- 
ever these durations cannot be defined in terms of months, but only in terms of 
days. A third party library called dateutil [45] provides a replacement feature, 
relativedelta, able to express durations in months and years. This library 
seems widely used, as it ranks within the top 20 most downloaded Python pack- 
ages. On our tests, this library rounds down. This seems to be confirmed by 
the documentation stating that “adding one month will never cross the month 
boundary.” Similarly to Java, this rounding behavior is not configurable. The 
boost C++ [9] and the luxon [31] JavaScript libraries exhibit similar behaviors. 


The coreutils implementation of date arithmetic follows a different prin- 
ciple, which is not expressible in our semantics. When adding months, this im- 
plementation first computes an adjusted date which might not be valid. This 
adjusted date dą is then normalized using POSIX’s mktime function. For exam- 
ple, adding one month to 2023-03-31 yields adjusted date 2023-04-31, which 
does not exist and is normalized into 2023-05-01. In this case, the behavior is 
the same as the upper rounding. There are however cases where its behavior dif- 
fers: adding one month to 2023-01-31 yields adjusted date 2023-02-31, which is 
normalized into 2023-03-03. This behavior breaks monotonicity of the addition 
in the date argument (2023-02-01 + 1 MONTH is 2023-03-01). In ambiguous 
computations, the debug mode of the date utility outputs a warning with the 
following message “when adding relative months/years, it is recommended to 
specify the 15th of the months” — which is a sufficient condition to avoid any 
ambiguity. This semantics is also followed by the calendar [50] library of OCaml. 


We finish this survey with the case of spreadsheet editors (such as Google 
Sheets), and highlight an inconsistent behavior we have found in them. The 
EDATE function adds a given number of months to a date. In our experiments, 
this function silently rounds down. As such, adding 18 years (that is, 216 months) 
to 2004-02-29 yields the date 2022-02-28. These spreadsheets applications also 
offer the DATEDIF function, which can compute the duration in years between 
two dates. In that case, DATEDIF(2004-02-29, 2022-02-28) yields 17 years (18 
years are reached when the second date is 2022-03-01). This behavior is incon- 
sistent with EDATE. Cheng and Rival [11] focus on performing a type analysis of 
spreadsheet applications, given that a runtime type casting may silently happen 
and provide unwanted results (similarly to what JavaScript does). This analysis 
supports a variety of types, including dates, but as it focuses on type information 
there is no mention of the value semantics of operations on dates. 


The book of Reingold and Dershowitz [48] can be seen as the hacker’s delight 
of calendar computations, with many efficient formulas for day additions, and a 
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wide range of different calendars being presented. Their work does not mention 
nor address the issue of month addition, and potential date rounding, which is at 
the core of our work. Although we have not needed it for now, we could leverage 
their approach to optimize the recursive computations of our library. Similarly, 
ISO 8601 defines the representation of dates in the Gregorian calendar, but does 
not address date-duration additions with years or months. 

The Formal Vindications start-up developed a mechanized, formally verified 
implementation of a time management library |2, 3] in Coq, computing over 
dates and time, including specific technical points (timezones, leap seconds). 
Their duration of a month is defined as 30 days. Some recent changes allow 
to round down dates. A similar effort was developed in Lean 4 by Bailey [6], 
but this library only supports the addition of days to a date. As a reminder, 
the Catala project currently targets laws that do not need to go beyond the 
precision of a day in terms of time management. Formal Vindications developed 
a formally verified, high-precision tachograph software for enforcing truck drivers 
scheduling laws [19]. 

We finish this related work by highlighting similarities between floating-point 
and date arithmetic. Floating-point arithmetic is more complex and widely used, 
but both settings have rounding operators with different modes available. This 
similarity has guided us in our search for properties that hold and counter- 
examples presented in Sec. 3. The static analysis to prove non-ambiguity of date 
computations presented in Sec. 4 can be seen as the abstract execution of the 
computation under both rounding modes, to compare results. To the best of our 
knowledge, no such static analysis for floating-point programs try to bound the 
difference in computations between two rounding modes. Tools such as Daisy 
[8, 18], Fluctuat [23] and FPTaylor [51] usually aim at upper-bounding errors 
between ideal computations over reals and a machine computation using floating- 
point. 


7 Conclusion and Future Work 


Legal expert systems rely on date computations, which are ambiguously de- 
fined in some corner cases. There are different ways of solving these ambiguities 
through different rounding operators, where no operator prevails over the oth- 
ers. We have thus defined semantics for date computations, taking into account 
these ambiguities to either raise errors, or round the result (either up or down). 
This semantics has been implemented into a publicly available OCaml library. 
We have studied this semantics and have formally proved several properties they 
satisfy, and exhibited counter-examples to usual properties they do not satisfy. 
We have defined and implemented an analysis that is able to prove an expression 
to be rounding-insensitive in a given program. This analysis relies on partition- 
ing and relational abstract domains to maintain the best possible precision, and 
can generate understandable counter-examples hints. Both our library and the 
rounding-sensitivity analysis have been integrated within the Catala language — 
which focuses on implementing computational laws. Through our analysis, we 


446 R. Monat, A. Fromherz and D. Merigoux 


found rounding-sensitivity issues in the implementation of the French housing 
benefits in Catala. We surveyed the behavior of mainstream date arithmetic 
libraries, and developed litmus tests that can be used to test new libraries. 

There are limitations to our static analysis: its soundness has not been proved 
mechanically, but the proofs simply lift theorems that have been formally ver- 
ified. The current analyzed language is a core imperative language which was 
sufficient for our case studies. Having an inter-scope analysis within the Catala 
to Mopsa translation would improve our precision in the case study. We plan 
to craft human-readable error messages from Mopsa’s output. We believe the 
relevant constraints are already properly extracted by Mopsa and the rest of the 
work consists in engineering, in order to inverse the translation from Catala date 
computations to Mopsa programs. 

In spite of these limitations, we believe this paper to be a crucial step into 
clarifying and improving the robustness of many computer programs implement- 
ing “business logic’, often overlooked by formal methods. The widespread use of 
date arithmetic in programs used by companies or government agencies to op- 
erate massive financial transfer should have prompted a formal analysis of date 
rounding a long time ago, but the existing literature only indicates a recent 
interest from the formal methods community on the matter. 

This work was triggered by the problems we found during interdisciplinary 
investigations about French housing benefits using the Catala programming lan- 
guage. From these investigations surfaced the need for various formal analysis, 
which we have thus started integrating into the programming language. We hope 
to further develop the integration of static analysis into the Catala proof plat- 
form, thus benefiting both legal and computer science users by including formal 
methods advances into development processes of Catala programs. 


Artifact Availability Statement. All our development is under open-source 
licenses, public or in the process of being upstreamed into a public development. 
To foster reproducibility of our results, we provide an artefact [39] containing 
the formal proofs written in F*, our date calculation library, and our ambiguity 
detection analysis as well as supporting evidence of our case study. 
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